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B ACKGROUND OF THE INVENTION 

Field of the Invention 

[0007] This invention relates generally to digital communications, and more 

particularly, to digital coding (or compression) of speech and/or audio signals. 

Related Art 

[0008] In speech or audio coding, the coder encodes the input speech or audio 

signal into a digital bit stream for transmission or storage, and the decoder 
decodes the bit stream into an output speech or audio signal. The combination 
of the coder and the decoder is called a codec. 

[0009] In the field of speech coding, predictive coding is a very popular 

technique. Prediction of the input waveform is used to remove redundancy 
from the waveform, and instead of quantizing an input speech waveform 
directly, a residual signal waveform is quantized. The predictor(s) used in 
predictive coding can be either backward adaptive or forward adaptive 
predictors. Backward adaptive predictors do not require any side information 
as they are derived from a previously quantized waveform, and therefore can 
be derived at a decoder. On the other hand, forward adaptive predictor(s) 
require side information to be transmitted to the decoder as they are derived 
from the input waveform, which is not available at the decoder. 

[0010] In the field of speech coding, two types of predictors are commonly 

used. A first type of predictor is called a short-term predictor. It is aimed at 
removing redundancy between nearby samples in the input waveform. This is 
equivalent to removing a spectral envelope of the input waveform. A second 
type of predictor is often referred as a long-term predictor. It removes 
redundancy between samples further apart, typically spaced by a time 



difference that is constant for a suitable duration. For speech, this time 
difference is typically equivalent to a local pitch period of the speech signal, 
and consequently the long-term predictor is often referred as a pitch predictor. 
The long-term predictor removes a harmonic structure of the input waveform. 
A residual signal remaining after the removal of redundancy by the 
predictor(s) is quantized along with any information needed to reconstruct the 
predictor(s) at the decoder. 

This quantization of the residual signal provides a series of bits 
representing a compressed version of the residual signal. This compressed 
version of the residual signal is often denoted the excitation signal and is used 
to reconstruct an approximation of the input waveform at the decoder in 
combination with the predictor(s). Generating the series of bits representing 
the excitation signal is commonly denoted excitation quantization and 
generally requires the search for, and selection of, a best or preferred candidate 
excitation among a set of candidate excitations with respect to some cost 
fimction. The search and selection require a number of mathematical 
operations to be performed, which translates into a certain computational 
complexity when the operations are implemented on a signal processing 
device. It is advantageous to minimize the number of mathematical operations 
in order to minimize a power consumption, and maximize a processing 
bandwidth, of the signal processing device. 

Excitation quantization in predictive coding can be based on a sample- 
by-sample quantization of the excitation. This is referred to as Scalar 
Quantization (SQ). Techniques for performing Scalar Quantization of the 
excitation are relatively simple, and thus, the computational complexity 
associated with SQ is relatively manageable. 

Alternatively, the excitation can be quantized based on groups of 
samples. Quantizing groups of samples is often referred to as Vector 
Quantization (VQ), and when applied to the excitation, simply as excitation 
VQ. The use of VQ can provide superior performance to SQ, and may be 
necessary when the number of coding bits per residual signal sample becomes 
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small (typically less than two bits per sample). Also, VQ can provide a greater 
flexibility in bit-allocation as compared to SQ, since a fractional nximber of 
bits per sample can be used. However, excitation VQ can be relatively 
complex when compared to excitation SQ. Therefore, there is need to reduce 
the complexity of excitation VQ as used in a predictive coding environment. 
[0014] One type of predictive coding is Noise Feedback Coding (NFC), 

wherein noise feedback filtering is used to shape coding noise, in order to 
improve a perceptual quality of quantized speech. Therefore, it would be 
advantageous to use excitation VQ with noise feedback coding, and further, to 
do so in a computationally efficient manner. 



SUMMARY OF THE INVENTION 



Summary 

[0100] The present invention is directed to first and second efficient excitation 

VQ search methods using correlation techniques, for use in predictive, noise 
feedback coding of a speech or audio signal. The first and second methods of 
the present invention are described below in Section IX.C, in connection with 
FIGs. 18, 19, and 20. The first and second methods of the present invention 
may be used independently or jointly. The first method (described below in 
Section IX.C.l) provides an efficient VQ search method for a general VQ 
codebook, that is, no particular structure of the VQ codebook is assumed. On 
the other hand, the second method (described below in Section IX.C.2) 
provides an efficient method for the excitation quantization in the case where a 
signed VQ codebook is used for the excitation. 

[0101] The first method reduces the complexity of the excitation VQ in NFC 

by reorganizing a calculation of an energy of an error vector for each of a 
plurality of candidate excitation vectors, also referred to as a codebook vector. 
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The energy of the error vector is the cost function that is minimized during the 
search of the excitation codebook. The reorganization is obtained by: 

1. Expanding a Mean Squared Error (MSE) term of the error 

vector; 

2. Excluding an energy term that is invariant to the candidate 

excitation vector; and 

3 . Pre-computing energy terms of ZERO-STATE responses of the 
candidate excitation vectors that are invariant to sub-vectors of a subframe. 

[0102] The second method presents an efficient way of searching the 

excitation codebook in the case where a signed codebook is used. The second 
method reorganizes the calculation of the energy of the error vector in such a 
way that only half of the total number of codevectors is searched. 

[0015] The combination of the first and second methods also provides an 

efficient search. However, there may be circumstances where the first and 
second methods are used separately. For example, if a signed codebook is not 
used, then only the first method applies. 

[0016] As mentioned above, the first and second excitation VQ search 

methods of the present invention (described in connection with FIGs. 18, 19, 
and 20) are used with NFC systems. For example, the methods of the present 
invention are useable with the exemplary NFC systems, structures, and 
methods described in connection v^th FIGs. 1-17, to the extent excitation VQ 
is used in these systems, structures, and methods. 



Terminology 

Predictor: 

A predictor P as referred to herein predicts a current signal value (e.g., 
a current sample) based on previous or past signal values (e.g., past samples). 
A predictor can be a short-term predictor or a long-term predictor. A short- 
term signal predictor (e.g., a short term speech predictor) can predict a current 
signal sample (e.g., speech sample) based on adjacent signal samples from the 
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immediate past. With respect to speech signals, such "short-term" predicting 
removes redimdancies between, for example, adjacent or close-in signal 
samples. A long-term signal predictor can predict a current signal sample 
based on signal samples from the relatively distant past. With respect to a 
speech signal, such "long-term" predicting removes redundancies between 
relatively distant signal samples. For example, a long-term speech predictor 
can remove redundancies between distant speech samples due to a pitch 
periodicity of the speech signal. 
[0018] The phrases "a predictor P predicts a signal s(n) to produce a signal 

ps(n)" means the same as the phrase ''a predictor P makes a prediction ps(n) of 
a signal s(n)." Also, a predictor can be considered equivalent to a predictive 
filter that predictively filters an input signal to produce a predictively filtered 
output signal. 

Coding noise and filtering thereof: 

[0019] Often, a speech signal can be characterized in part by spectral 

characteristics (i.e., the frequency spectrum) of the speech signal. Two known 
spectral characteristics include 1) what is referred to as a harmonic fine 
structure or line frequencies of the speech signal, and 2) a spectral envelope of 
the speech signal. The harmonic fine structure includes, for example, pitch 
harmonics, and is considered a long-term (spectral) characteristic of the 
speech signal. On the other hand, the spectral envelope of the speech signal is 
considered a short-term (spectral) characteristic of the speech signal. 

[0020] Coding a speech signal can cause audible noise when the encoded 

speech is decoded by a decoder. The audible noise arises because the coded 
speech signal includes coding noise introduced by the speech coding process, 
for example, by quantizing signals in the encoding process. The coding noise 
can have spectral characteristics (i.e., a spectrum) different from the spectral 
characteristics (i.e., spectrum) of natural speech (as characterized above). 
Such audible coding noise can be reduced by spectrally shaping the coding 
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noise (i.e., shaping the coding noise spectrum) such that it corresponds to or 
follows to some extent the spectral characteristics (i.e., spectrum) of the 
speech signal. This is referred to as "spectral noise shaping"' of the coding 
noise, or ''shaping the coding noise spectrum:' The coding noise is shaped to 
follow the speech signal spectrum only "to some extent" because it is not 
necessary for the coding noise spectrum to exactly follow the speech signal 
spectrum. Rather, the coding noise spectrum is shaped sufficiently to reduce 
audible noise, thereby improving the perceptual quality of the decoded speech. 

Accordingly, shaping the coding noise spectrum (i.e. spectrally 
shaping the coding noise) to follow the harmonic fine structure (i.e., long-term 
spectral characteristic) of the speech signal is referred to as ''harmonic noise 
(spectral) shaping" or "long-term noise (spectral) shaping,'" Also, shaping the 
coding noise spectrum to follow the spectral envelope (i.e., short-term spectral 
characteristic) of the speech signal is referred to a ''short-term noise (spectral) 
shaping"' or ''envelope noise (spectral) shaping,'' 

Noise feedback filters can be used to spectrally shape the coding noise 
to follow the spectral characteristics of the speech signal, so as to reduce the 
above mentioned audible noise. For example, a short-term noise feedback 
filter can short-term filter coding noise to spectrally shape the coding noise to 
follow the short-term spectral characteristic (i.e., the envelope) of the speech 
signal. On the other hand, a long-term noise feedback filter can long-term 
filter coding noise to spectrally shape the coding noise to follow the long-term 
spectral characteristic (i.e., the harmonic fine structure or pitch harmonics) of 
the speech signal. Therefore, short-term noise feedback fihers can effect 
short-term or envelope noise spectral shaping of the coding noise, while long- 
term noise feedback filters can effect long-term or harmonic noise spectral 
shaping of the coding noise, in the present invention. 



BRIEF DESCRIPTION OF THE DRAWINGS 



[0023] The present invention is described with reference to the accompanying 

drawings. In the drawings, Uke reference numbers indicate identical or 
functionally similar elements. 

[0024] FIG. 1 is a block diagram of a first conventional noise feedback coding 

structure or codec. 

[0025] FIG. lA is a block diagram of an example NFC structure or codec 

using composite short-term and long-term predictors and a composite short- 
term and long-term noise feedback filter, according to a first embodiment of 
the present invention. 

[0026] FIG. 2 is a block diagram of a second conventional noise feedback 

coding structure or codec. 

[0027] FIG. 2A is a block diagram of an example NFC structure or codec 

using a composite short-term and long-term predictor and a composite short- 
term and long-term noise feedback filter, according to a second embodiment 
of the present invention. 

[0028] FIG- 3 is a block diagram of a first example arrangement of an 

example NFC structure or codec, according to a third embodiment of the 
present invention. 

[0029] FIG. 4 is a block diagram of a first example arrangement of an 

example nested two-stage NFC structure or codec, according to a fourth 

embodiment of the present invention. 
[0030] FIG. 5 is a block diagram of a first example arrangement of an 

example nested two-stage NFC structure or codec, according to a fifth 

embodiment of the present invention. 
[0031] FIG. 5A is a block diagram of an alternative but mathematically 

equivalent signal combining arrangement corresponding to a signal combining 

arrangement of FIG. 5. 
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[0032] FIG. 6 is a block diagram of a first example arrangement of an 

example nested two-stage NFC structure or codec, according to a sixth 

embodiment of the present invention. 
[0033] FIG. 6A is an example method of coding a speech or audio signal 

using any one of the codecs of FIGs. 3-6. 
[0034] FIG. 6B is a detailed method corresponding to a predictive quantizing 

step of FIG. 6A. 

[0035] FIG. 7 is a detailed block diagram of an example NFC encoding 

structure or coder based on the codec of FIG. 5, according to a preferred 
embodiment of the present invention, 
p [0036] FIG. 8 is a detailed block diagram of an example NFC decoding 

|tt structure or decoder for decoding encoded speech signals encoded using the 

:f coder of FIG. 7. 

Q [0037] FIG. 9 is a detailed block diagram of a short-term linear predictive 

MJ 

J: "' analysis and quantization signal processing block of the coder of FIG, 7. The 

I f signal processing block obtains coefficients for a short-term predictor and a 

ni short-term noise feedback filter of the coder of FIG. 7. 

m 

M [0038] FIG. 10 is a detailed block diagram of a Line Spectrum Pair (LSP) 

iU quantizer and encoder signal processing block of the short-term linear 

predictive analysis and quantization signal processing block of FIG, 9. 
[0039] FIG. 1 1 is a detailed block diagram of a long-term linear predictive 

analysis and quantization signal processing block of the coder of FIG. 7. The 

signal processing block obtains coefficients for a long-term predictor and a 

long-term noise feedback filter of the coder of FIG. 7. 
[0040] FIG. 12 is a detailed block diagram of a prediction residual quantizer 

of the coder of FIG. 7. 
[0041] FIG. 13A is a block diagram of an example NFC system for searching 

through N VQ codevectors stored in a VQ codebook for a preferred one of the 

N VQ codevectors to be used for coding a speech or audio signal. 
[0042] FIG. 13B is a flow diagram of an example method, corresponding to 

the NFC system of FIG. 13 A, of searching N VQ codevectors stored in VQ 
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codebook for a preferred one of the N VQ codevectors to be used in coding a 
speech or audio signal. 
[0043] FIG. 13C is a block diagram of a portion of an example codec structure 

or system used in an example prediction residual VQ codebook search of the 
codec of FIG. 5. 

[0044] FIG. 1 3D is an example method implemented by the system of FIG. 

13C. 

[0045] FIG. 13E is an example method executed concuiTently with the method 

of FIG. 13D using the system of FIG. 13C. 
[0046] FIG. 14A is a block diagram of an example NFC system for efficiently 

searching through N VQ codevectors stored in a VQ codebook for a preferred 

one of the N VQ codevectors to be used for coding a speech or audio signal. 
[0047] FIG. 14B is an example method implemented using the system of 

FIG. 14 A. 

[0048] FIG. 14C is an example filter structure, during a calculation of a 

ZERO-INPUT response of a quantization error signal, used in the example 
prediction residual VQ codebook search corresponding to FIG. 13C. 

[0049] FIG. 14D is an example method of deriving a ZERO-INPUT response 

using the ZERO-INPUT response filter structure of FIG. 14C. 

[0050] FIG. 14E is another example method of deriving a ZERO-INPUT 

response, executed concurrently with the method of FIG. 14D, using the 
ZERO-INPUT response filter structure of FIG. 14C. 

[0051] FIG, 15A is a block diagram of an example filter structure, during a 

calculation of a ZERO-STATE response of a quantization error signal, used in 
the example prediction residual VQ codebook search corresponding to FIGs. 
13Candl4C. 

[0052] FIG. 15B is a flowchart of an example method of deriving a ZERO- 

STATE response using the filter structure of FIG. 15 A. 

[0053] FIG. 1 6 A is a block diagram of a filter structure according to another 

embodiment of the ZERO-STATE response filter structure of FIG. 14 A. 



[0054] FIG. 16B is a flowchart of an example method of deriving a ZERO- 

STATE response using the filter structure of FIG. 16A. 

[0055] FIG. 17 is a flowchart of an example method of reducing the 

computational complexity associated with searching a VQ codebook. 

[0056] FIG. 18 is a flow chart of an example method of quantizing multiple 

vectors in a master vector using correlation techniques, according to the 
present invention. 

[0057] FIG, 19 is a flowchart of an example method using an unsigned VQ 

codebook, expanding on the method of FIG. 18. 
[0058] FIG. 20 is a flow chart of an example method using a signed VQ 

codebook, expanding on the method of FIG. 18. 
[0059] FIG. 21 is a block diagram of a computer system on which the present 

invention can be implemented. 
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L Conventional Noise Feedback Coding 

[0060] Before describing the present invention, it is helpful to first describe 

the conventional noise feedback coding schemes. 

A. First Conventional Coder 

[0061] FIG. 1 is a block diagram of a first conventional NFC structure or 

codec 1000. Codec 1000 includes the following functional elements: a first 
predictor 1002 (also referred to as predictor P(z)); a first combiner or adder 
1004; a second combiner or adder 1006; a quantizer 1008; a third combiner or 
adder 1010; a second predictor 1012 (also refen^ed to as a predictor P(z)); a 
fourth combiner 1014; and a noise feedback filter 1016 (also referred to as a 
filter F(z)), 

[0062] Codec 1000 encodes a sampled input speech or audio signal s(n) to 

produce a coded speech signal, and then decodes the coded speech signal to 
produce a reconstructed speech signal sq(n), representative of the input speech 
signal s(n). Reconstructed output speech signal sq(n) is associated with an 
overall coding noise r(n) = s(n) - sq(n). An encoder portion of codec 1000 
operates as follows. Sampled input speech or audio signal s(n) is provided to a 
first input of combiner 1004, and to an input of predictor 1002. Predictor 1002 
makes a prediction of current speech signal s(n) values (e.g., samples) based 
on past values of the speech signal to produce a predicted signal ps(n). This 
process is referred to as predicting signal s(n) to produce predicted signal 
ps(n). Predictor 1002 provides predicted speech signal ps(n) to a second input 
of combiner 1004. Combiner 1004 combines signals s(n) and ps(n) to produce 
a prediction residual signal d(n). 

[0063] Combiner 1006 combines residual signal d(n) with a noise feedback 

signal fq(n) to produce a quantizer input signal u(n). Quantizer 1008 
quantizes input signal u(n) to produce a quantized signal uq(n). Combiner 
1014 combines (that is, differences) signals u(n) and uq(n) to produce a 
quantization error or noise signal q(n) associated with the quantized signal 
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uq(n). Filter 1016 filters noise signal q(n) to produce feedback noise signal 
fq(n). 

(0064] A decoder portion of codec 1000 operates as follows. Exiting 

quantizer 1008, combiner 1010 combines quantizer output signal uq(n) with a 
prediction ps(n)' of input speech signal s(n) to produce reconstructed output 
speech signal sq(n). Predictor 1012 predicts input speech signal s(n) to 
produce predicted speech signal ps(n)', based on past samples of output 
speech signal sq(n). 

(0065] The following is an analysis of codec 1000 described above. The 

predictor P(z) (1002 or 1012) has a transfer function of 

M 

where M is the predictor order and a, is the z-th predictor coefficient. The 
noise feedback filter F(z) (1016) can have many possible forms. One popular 
form of F(z) is given by 

Fiz) = f^f,z-' . 

This form of noise feedback filter was used by B. S. Atal and M. R. Schroeder 
in their publication "Predictive Coding of Speech Signals and Subjective Error 
Criteria," IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 

247-254, June 1979, with L - A/, and /. - a' a, , or F(z) = P(z/a). 
10066] With the NFC codec structure 1000 in FIG. 1, it can be shown that the 

codec reconstruction error, or coding noise, is given by 

r{n) = s{n) - sq(n) = ^ a^r(n ~i)-h q{n) - ^ f^q{n - 0 , 

/=! /=! 



or in terms of z-transform representation, 

R^z)='-^Qiz) . 

^ ^ 1-P(2) 

[0067] If the encoding bit rate of the quantizer 1008 in FIG. 1 is sufficiently 

high, the qxiantization error q(n) = u(n) - uq(n) is roughly white. From the 
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equation above, it follows that the magnitude spectrum of the coding noise 
r(n) will have the same shape as the magnitude of the frequency response of 
the filter [7 - F(z)'\ I [i - P(z)l If F(z)= P(z), then R(z) = Q(z), the coding 
noise is white, and the system 1000 in FIG. 1 is equivalent to a conventional 
DPCM codec. lfF(z) = 0, then R(z) = Q(z) / [1 - P(z)l the coding noise has 
the same spectral shape as the input signal spectrum, and the codec system 
1000 in FIG. 1 becomes a so-called "open-loop DPCM" codec. If F(z) is 
somewhere between P(z) and 0, for example, F(z) = P(z/a), where 0 < a < 1, 
then the spectrum of the coding noise is somewhere between a white spectrum 
and the input signal spectrum. Coding noise spectrally shaped this way is 
indeed less audible than either the white noise or the noise with spectral shape 
identical to the input signal spectrum. 

B. Second Conventional Codec 

[0068] FIG. 2 is a block diagram of a second conventional NFC structure or 

codec 2000. Codec 2000 includes the following functional elements: a first 
combiner or adder 2004; a second combiner or adder 2006; a quantizer 2008; a 
third combiner or adder 2010; a predictor 2012 (also referred to as a predictor 
P(z)); a fourth combiner 2014; and a noise feedback filter 2016 (also referred 
to as afilterN(z)-l). 

[0069] Codec 2000 encodes a sampled input speech signal s(n) to produce a 

coded speech signal, and then decodes the coded speech signal to produce a 
reconstructed speech signal sq(n), representative of the input speech signal 
s(n). Reconstructed speech signal sq(n) is associated with an overall coding 
noise r(n) = s(n) - sq(n). Codec 2000 operates as follows. A sampled input 
speech or audio signal s(n) is provided to a first input of combiner 2004. A 
feedback signal x(n) is provided to a second input of combiner 2004. 
Combiner 2004 combines signals s(n) and x(n) to produce a quantizer input 
signal u(n). Quantizer 2008 quantizes input signal u(n) to produce a quantized 
signal uq(n) (also referred to as a quantizer output signal uq(n)). Combiner 
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2014 combines (that is, differences) signals u(n) and uq(n) to produce a 
quantization error or noise signal q(n) associated with the quantized signal 
uq(n). Filter 2016 filters noise signal q(n) to produce feedback noise signal 
fq(n). Combiner 2006 combines feedback noise signal fq(n) with a predicted 
signal ps(n) (i.e., a prediction of input speech signal s(n)) to produce feedback 
signal x(n). 

[0070] Exiting quantizer 2008, combiner 2010 combines quantizer output 

signal uq(n) with prediction or predicted signal ps(n) to produce reconstructed 
output speech signal sq(n). Predictor 2012 predicts input speech signal s(n) 
(to produce predicted speech signal ps(n)) based on past samples of output 
speech signal sq(n). Thus, predictor 2012 is included in the encoder and 
decoder portions of codec 2000. 

[0071] Codec structure 2000 was proposed by J. D. Makhoul and M. Berouti 

in "Adaptive Noise Spectral Shaping and Entropy Coding in Predictive 
Coding of Speech," IEEE Transactions on Acoustics, Speech, ci^d Signal 
Processings pp. 63-73, February 1979. This equivalent, known NFC codec 
structure 2000 has at least two advantages over codec 1000. First, only one 
predictor P(z) (2012) is used in the structure. Second, if N(z) is the filter 
whose frequency response corresponds to the desired noise spectral shape, this 
codec structure 2000 allows us to use {N(z) - 1] directly as the noise feedback 
filter 2016. Makhoul and Berouti showed in their 1979 paper that very good 
perceptual speech quality can be obtained by choosing N(z) to be a simple 
second-order finite-impulse-response (FIR) filter. 

[0072] The codec structures in Figs 1 and 2 described above can each be 

viewed as a predictive codec with an additional noise feedback loop. In 
Fig. 1, a noise feedback loop is added to the structure of an "open-loop 
DPCM" codec, where the predictor in the encoder uses unquantized original 
input signal as its input. In Fig. 2, on the other hand, a noise feedback loop is 
added to the structure of a ''closed-loop DPCM" codec, where the predictor in 
the encoder uses the quantized signal as its input. Other than this difference in 
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the signal that is used as the predictor input in the encoder, the codec 
structures in Fig.l and Fig. 2 are conceptually very similar. 

II. Two-Stage Noise Feedback Coding 

[0073] The conventional noise feedback coding principles described above are 

well-known prior art. Now we will address our stated problem of two-stage 
noise feedback coding with both short-term and long-term prediction, and both 
short-term and long-term noise spectral shaping. 

A. Composite Codec Embodiments 

[0074] A first approach is to combine a short-term predictor and a long-term 

predictor into a single composite short-term and long-term predictor, and then 
re-use the general structure of codec 1000 in FIG. 1 or that of codec 2000 in 
FIG. 2 to construct an improved codec corresponding to the general structure 
of codec 1000 and an improved codec corresponding to the general structure 
of codec 2000. Note that in FIG. 1, the feedback loop to the right of the 
symbol uq(n) that includes the adder 1010 and the predictor loop (including 
predictor 1012) is often called a synthesis filter, and has a transfer function of 
1/[1 - P(z)']. Also note that in most predictive codecs employing both short- 
term and long-term prediction, the decoder has two such synthesis filters 
cascaded: one with the short-term predictor and the other with the long-term 
predictor in the feedback loop. Let Ps(z) and Pl(z) be the transfer functions of 
the short-term predictor and the long-term predictor, respectively. Then, the 
cascaded synthesis filter will have a transfer function of 

1 ^ 1 1 

[1 - P5(Z)][1 - P/(Z)] 1 - PS{Z) - PI{Z) + PS{Z)PI{Z) 1 - P'{Z) ' 

where F (z) = Ps(z) + Pl(z) - Ps(z)Pl(z) is the composite predictor (for 
example, the predictor that includes the effects of both short-term prediction 
and long-term prediction). 
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|0075] Similarly, in FIG. 1, the filter structure to the left of the symbol d(n)^ 

including the adder 1004 and the predictor loop (i.e., including predictor 
1002), is often called an analysis filter^ and has a transfer ftmction of 1 - P(z), 
If we cascade two such analysis filters, one with the short-term predictor and 
the other with the long-term predictor, then the transfer function of the 
cascaded analysis filter is 

[1 - PsizW - Pl{z)] - 1 ~ Ps{z) - Pl{z) + Ps(z)Pl(z) = 1 - P\z) . 
10076] Therefore, one can replace the predictor P(z) (1002 or 1012) in FIG. 1 

and the predictor P(z) (2012) in FIG. 2 by the composite predictor P' (z) = 
Ps(z) + Pl(z) - Ps(z)Pl(z) to get the effect of two-stage prediction. To get both 
!:' short-term and long-term noise spectral shaping, one can use the general 

coding structure of codec 1000 in FIG. 1 and choose the filter transfer function 
F(z) = Ps(z/a) + Pl(z//3) ~ Ps(z/a)Pl(z/p) = F (z). Then, the noise spectral 
T:\ shape will follow the frequency response of the filter 

;^ 1 - FXz) ^ 1 - Ps{z I a) - Pljz I p) + Psjz I a)Pl{z / >g) ^ [1 - Ps{z I a)] [1 -Pl{zl fi)] 

1 - P\z) 1 - Ps{z) - Pl{z) + Ps(z)Pl(z) [1 - Ps(z)] [1 - Pl(z)] 

if v: 

[0077] Thus, both short-term noise spectral shaping and long-term spectral 

1 shaping are achieved, and they can be individually controlled by the 

parameters a and /?, respectively. 

1 . First Codec Embodiment - Composite Codec 

[0078] FIG. lA is a block diagram of an example NFC structure or codec 

1050 using composite short-term and long-term predictors P'(z) and a 
composite short-term and long-term noise feedback filter F' (z), according to a 
first embodiment of the present invention. Codec 1050 reuses the general 
structure of known codec 1000 in FIG. 1, but replaces the predictors P(z) and 
filter of codec 1000 F(z) with the composite predictors P'(z) and the composite 
filter F'(z), as is further described below. 
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[0079] 1050 includes the following functional elements: a first composite 

short-term and long-term predictor 1052 (also referred to as a composite 
predictor P'(z)); a first combiner or adder 1054; a second combiner or adder 
1056; a quantizer 1058; a third combiner or adder 1060; a second composite 
short-term and long-term predictor 1062 (also referred to as a composite 
predictor P'(z)); a fourth combiner 1064; and a composite short-term and long- 
term noise feedback fiher 1066 (also referred to as a filter F'(z)). 

[0080] The functional elements or blocks of codec 1050 listed above are 

arranged similarly to the corresponding blocks of codec 1000 (described above 
in connection with FIG. 1) having reference numerals decreased by "50." 
Accordingly, signal flow between the functional blocks of codec 1050 is 
similar to signal flow between the corresponding blocks of codec 1000. 

[0081] Codec 1050 encodes a sampled input speech signal s(n) to produce a 

coded speech signal, and then decodes the coded speech signal to produce a 
reconstructed speech signal sq(n), representative of the input speech signal 
s(n). Reconstructed speech signal sq(n) is associated with an overall coding 
noise r(n) = s(n) - sq(n). An encoder portion of codec 1050 operates in the 
following exemplary manner. Composite predictor 1052 short-term and long- 
term predicts input speech signal s(n) to produce a short-term and long-term 
predicted speech signal ps(n). Combiner 1054 combines short-term and long- 
term predicted signal ps(n) with speech signal s(n) to produce a prediction 
residual signal d(n). 

[0082] Combiner 1056 combines residual signal d(n) with a short-term and 

long-term filtered, noise feedback signal fq(n) to produce a quantizer input 
signal u(n). Quantizer 1058 quantizes input signal u(n) to produce a quantized 
signal uq(n) (also referred to as a quantizer output signal) associated with a 
quantization noise or error signal q(n). Combiner 1064 combines (that is, 
differences) signals u(n) and uq(n) to produce the quantization error or noise 
signal q(n). Composite filter 1066 short-term and long-term filters noise 
signal q(n) to produce short-term and long-term filtered, feedback noise signal 
fq(n). In codec 1050, combiner 1064, composite short-term and long-term 
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filter 1066, and combiner 1056 together form a noise feedback loop around 
quantizer 1058. Tliis noise feedback loop spectrally shapes the coding noise 
associated with codec 1050, in accordance with the composite filter, to follow, 
for example, the short-term and long-term spectral characteristics of input 
speech signal s(n). 

[0083] A decoder portion of coder 1050 operates in the following exemplary 

manner. Exiting quantizer 1058, combiner 1060 combines quantizer output 
signal uq(n) with a short-term and long-term prediction ps(n)' of input speech 
signal s(n) to produce a quantized output speech signal sq(n). Composite 
predictor 1062 short-term and long-term predicts input speech signal s(n) (to 
produce short-term and long-term predicted signal ps(n)') based on output 
signal sq(n). 

2. Second Codec Embodiment-Alternative Composite 
Codec 

[0084] As an alternative to the above described first embodiment, a second 

embodiment of the present invention can be constructed based on the general 
coding structure of codec 2000 in FIG, 2. Using the coding structure of codec 
2000 with P(z) replaced by composite function F (z), one can choose a 
suitable composite noise feedback filter N'(z) - 1 (replacing filter 2016) such 
that it includes the effects of both short-term and long-term noise spectral 
shaping. For example, N'(z) can be chosen to contain two FIR filters in 
cascade: a short-term filter to control the envelope of the noise spectrum, 
while another, long-term filter, controls the harmonic structure of the noise 
spectrum. 

[0085] FIG. 2A is a block diagram of an example NFC structure or codec 

2050 using a composite short-term and long-term predictor P'(z) and a 
composite short-term and long-term noise feedback filter N'(z)-1, according to 
a second embodiment of the present invention. Codec 2050 includes the 
following functional elements: a first combiner or adder 2054; a second 
combiner or adder 2056; a quantizer 2058; a third combiner or adder 2060; a 
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composite short-term and long-term predictor 2062 (also referred to as a 
predictor P'(z)); a fovirth combiner 2064; and a noise feedback filter 2066 (also 
referred to as a filter N'(z)-l). 

[0086] The functional elements or blocks of codec 2050 listed above are 

arranged similarly to the corresponding blocks of codec 2000 (described above 
in comiection with FIG. 2) having reference numerals decreased by ''50." 
Accordingly, signal flow between the functional blocks of codec 2050 is 
similar to signal flow between the corresponding blocks of codec 2000, 

[0087] Codec 2050 operates in the following exemplary manner. Combiner 

2054 combines a sampled input speech or audio signal s(n) with a feedback 
signal x(n) to produce a quantizer input signal u(n). Quantizer 2058 quantizes 
input signal u(n) to produce a quantized signal uq(n) associated with a 
quantization noise or error signal q(n). Combiner 2064 combines (that is, 
differences) signals u(n) and uq(n) to produce quantization error or noise 
signal q(n). Composite filter 2066 concurrently long-term and short-term 
filters noise signal q(n) to produce short-term and long-term filtered, feedback 
noise signal fq(n). Combiner 2056 combines short-term and long-term 
filtered, feedback noise signal fq(n) with a short-term and long-term prediction 
s(n) of input signal s(n) to produce feedback signal x(n). In codec 2050, 
combiner 2064, composite short-term and long-term filter 2066, and combiner 
2056 together form a noise feedback loop around quantizer 2058. This noise 
feedback loop spectrally shapes the coding noise associated with codec 2050 
in accordance with the composite filter, to follow, for example, the short-term 
and long-term spectral characteristics of input speech signal s(n). 

[0088] Exiting quantizer 2058, combiner 2060 combines quantizer output 

signal uq(n) with the short-term and long-term predicted signal ps(n)' to 
produce a reconstructed output speech signal sq(n). Composite predictor 2062 
short-term an long-term predicts input speech signal s(n) (to produce short- 
term and long-term predicted signal ps(n)) based on reconstructed output 
speech signal sq(n). 
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[0089] In this invention, the first approach for two-stage NFC described above 

achieves the goal by re-using the general codec structure of conventional 
single-stage noise feedback coding (for example, by re-using the structures of 
codecs 1000 and 2000) but combining what are conventionally separate short- 
term and long-term predictors into a single composite short-term and long- 
term predictor. A second preferred approach, described below, allows 
separate short-term and long-term predictors to be used, but requires a 
modification of the conventional codec structures 1000 and 2000 of Figs. 1 
and 2. 

B. Codec Embodiments Using Separate Short-Term and Long- 
Term Predictors (Two-Stage Prediction) and Noise Feedback 
Coding 

[0090] It is not obvious how the codec structures in Figs. 1 and 2 should be 

modified in order to achieve two-stage prediction and two-stage noise spectral 
shaping at the same time. For example, assuming the filters in FIG. 1 are all 
short-term filters, then, cascading a long-term analysis filter after the short- 
term analysis filter, cascading a long-term synthesis filter before the short- 
term synthesis filter, and cascading a long-term noise feedback filter to the 
short-term noise feedback filter in FIG, 1 will not give a codec that achieves 
the desired result. 

[0091] To achieve two-stage prediction and two-stage noise spectral shaping 

at the same time without combining the two predictors into one, the key lies in 
recognizing that the quantizer block in Figs, 1 and 2 can be replaced by a 
coding system based on long-term prediction. Illustrations of this concept are 
provided below. 

1 . Third Codec Embodiment - Two Stage Prediction With 
One Stage Noise Feedback 

[0092] As an illustration of this concept, FIG. 3 shows a codec structure where 

the quantizer block 1008 in FIG. 1 has been replaced by a DPCM-type 
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structure based on long-term prediction (enclosed by the dashed box and 
labeled as Q' in FIG. 3). FIG. 3 is a block diagrana of a first exemplary 
arrangement of an example NFC structure or codec 3000, according to a third 
embodiment of the present invention. 

[0093] Codec 3000 includes the following functional elements: a first short- 

term predictor 3002 (also referred to as a short-term predictor Ps(z)); a first 
combiner or adder 3004; a second combiner or adder 3006; predictive 
quantizer 3008 (also referred to as predictive quantizer Q'); a third combiner 
or adder 3010; a second short-term predictor 3012 (also referred to as a short- 
term predictor Ps(z)); a fourth combiner 3014; and a short-term noise feedback 
filter 3016 (also referred to as a short-term noise feedback filter Fs(z)). 

[0103] Predictive quantizer Q' (3008) includes a first combiner 3024, either a 

scalar or a vector quantizer 3028, a second combiner 3030, and a long-term 
predictor 3034 (also referred to as a long-term predictor (Pl(z)). 

[0104] Codec 3000 encodes a sampled input speech signal s(n) to produce a 

coded speech signal, and then decodes the coded speech signal to produce a 
reconstructed output speech signal sq(n), representative of the input speech 
signal s(n). Reconstructed speech signal sq(n) is associated with an overall 
coding noise r(n) = s(n) - sq(n). Codec 3000 operates in the following 
exemplary manner. First, a sampled input speech or audio signal s(n) is 
provided to a first input of combiner 3004, and to an input of predictor 3002. 
Predictor 3002 makes a short-term prediction of input speech signal s(n) based 
on past samples thereof to produce a predicted input speech signal ps(n). This 
process is referred to as short-term predicting input speech signal s(n) to 
produce predicted signal ps(n). Predictor 3002 provides predicted input 
speech signal ps(n) to a second input of combiner 3004, Combiner 3004 
combines signals s(n) and ps(n) to produce a prediction residual signal d(n). 

[0105] Combiner 3006 combines residual signal d(n) with a first noise 

feedback signal fqs(n) to produce a predictive quantizer input signal v(n). 
Predictive quantizer 3008 predictively quantizes input signal v(n) to produce a 
predictively quantized output signal vq(n) (also referred to as a predictive 
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quantizer output signal vq(n)) associated with a predictive noise or error signal 
qs(n). Combiner 3014 combines (that is, differences) signals v(n) and vq(n) to 
produce the predictive quantization error or noise signal qs(n). Short-term 
filter 3016 short-term filters predictive quantization noise signal q(n) to 
produce the feedback noise signal fqs(n). Therefore, Noise Feedback (NF) 
codec 3000 includes an outer NF loop around predictive quantizer 3008, 
comprising combiner 3014, short-term noise filter 3016, and combiner 3006. 
This outer NF loop spectrally shapes the coding noise associated with codec 
3000 in accordance with filter 3016, to follow, for example, the short-term 
spectral characteristics of input speech signal s(n). 

[0106] Predictive quantizer 3008 operates within the outer NF loop mentioned 

above to predictively quantize predictive quantizer input signal v(n) in the 
following exemplary manner. Predictor 3034 long-term predicts (i.e., makes a 
long-term prediction of) predictive quantizer input signal v(n) to produce a 
predicted, predictive quantizer input signal pv(n). Combiner 3024 combines 
signal pv(n) with predictive quantizer input signal v(n) to produce a quantizer 
input signal u(n). Quantizer 3028 quantizes quantizer input signal u(n) using a 
scalar or vector quantizing technique, to produce a quantizer output signal 
uq(n). Combiner 3030 combines quantizer output signal uq(n) with signal 
pv(n) to produce predictively quantized output signal vq(n). 

[0107] Exiting predictive quantizer 3008, combiner 3010 combines predictive 

quantizer output signal vq(n) with a prediction ps(n)' of input speech signal 
s(n) to produce output speech signal sq(n). Predictor 3012 short-term predicts 
(i.e., makes a short-term prediction of) input speech signal s(n) to produce 
signal ps(n)', based on output speech signal sq(n), 

[0108] In the first exemplary arrangement of NF codec 3000 depicted in FIG. 

3, predictors 3002, 3012 are short-term predictors and NF filter 3016 is a 
short-term noise filter, while predictor 3034 is a long-term predictor. In a 
second exemplary arrangement of NF codec 3000, predictors 3002, 3012 are 
long-term predictors and NF filter 3016 is a long-term filter, while predictor 
3034 is a short-term predictor. The outer NF loop in this alternative 
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arrangement spectrally shapes the coding noise associated with codec 3000 in 
accordance with filter 3016, to follow, for example, the long-term spectral 
characteristics of input speech signal s(n). 
[0109] In the first arrangement described above, the DPCM structure inside 

the Q' dashed box (3008) does not perform long-term noise spectral shaping. 
If everything inside the Q' dashed box (3008) is treated as a black box, then 
for an observer outside of the box, the replacement of a direct quantizer (for 
example, quantizer 1008) by a long-term-prediction-based DPCM structure 
(that is, predictive quantizer Q' (3008)) is an advantageous way to improve the 
quantizer performance. Thus, compared with FIG. 1, the codec structure of 
codec 3000 in FIG. 3 will achieve the advantage of a lower coding noise, 
while maintaining the same kind of noise spectral envelope. In fact, the 
system 3000 in FIG. 3 is good enough for some applications when the bit rate 
is high enough and it is simple, because it avoids the additional complexity 
associated with long-term noise spectral shaping. 

2. Fourth Codec Embodiment - Two Stage Prediction 
With Two Stage Noise Feedback (Nested Two Stage 
Feedback Coding) 

[0110] Taking the above concept one step further, predictive quantizer Q' 

(3008) of codec 3000 in FIG. 3 can be replaced by the complete NFC structure 
of codec 1000 in FIG. L A resulting example "nested" or "layered" two-stage 
NFC codec structure 4000 is depicted in FIG. 4, and described below. 

[0111] FIG. 4 is a block diagram of a first exemplary arrangement of the 

example nested two-stage NF coding structure or codec 4000, according to a 
fourth embodiment of the present invention. Codec 4000 includes the 
following functional elements: a first short-term predictor 4002 (also referred 
to as a short-term predictor Ps(z)); a first combiner or adder 4004; a second 
combiner or adder 4006; a predictive quantizer 4008 (also referred to as a 
predictive quantizer Q"); a third combiner or adder 4010; a second short-term 
predictor 4012 (also referred to as a short-term predictor Ps(z)); a fourth 
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combiner 4014; and a short-term noise feedback filter 4016 (also referred to as 

a short-term noise feedback filter Fs(z)). 
[0112] Predictive quantizer Q" (4008) includes a first long-term predictor 

4022 (also referred to as a long-term predictor Pl(z)), a first combiner 4024, 

either a scalar or a vector quantizer 4028, a second combiner 4030, a second 

long-term predictor 4034 (also referred to as a long-term predictor (Pl(z)), a 

second combiner or adder 4036, and a long-term filter 4038 (also referred to as 

a long-term filter Fl(z)). 
[0113] Codec 4000 encodes a sampled input speech signal s(n) to produce a 

coded speech signal, and then decodes the coded speech signal to produce a 
ip, reconstructed output speech signal sq(n), representative of the input speech 

ipl signal s(n)- Reconstructed speech signal sq(n) is associated with an overall 

coding noise r(n) = s(n) - sq(n). In coding input speech signal s(n), predictors 
itj: 4002 and 4012, combiners 4004, 4006, and 4010, and noise filter 4016 operate 

\M similarly to corresponding elements described above in connection with FIG. 

p 3 having reference numerals decreased by "1000". Therefore, NF codec 4000 

\^ includes an outer or first stage NF loop comprising combiner 4014, short-term 

iiO noise filter 4016, and combiner 4006. This outer NF loop spectrally shapes 

jlp the coding noise associated with codec 4000 in accordance with filter 4016, to 

follow, for example, the short-term spectral characteristics of input speech 

signal s(n). 

[0114] Predictive quantizer Q" (4008) operates within the outer NF loop 

mentioned above to predictively quantize predictive quantizer input signal 
v(n) to produce a predictively quantized output signal vq(n) (also referred to as 
a predictive quantizer output signal vq(n)) in the following exemplary manner. 
As mentioned above, predictive quantizer Q" has a structure corresponding to 
the basic NFC structure of codec 1000 depicted in FIG. L In operation, 
predictor 4022 long-term predicts predictive quantizer input signal v(n) to 
produce a predicted version pv(n) thereof. Combiner 4024 combines signals 
v(n) and pv(n) to produce an intermediate result signal i(n). Combiner 4026 
combines intermediate result signal i(n) with a second noise feedback signal 



-28- 



fq(n) to produce a quantizer input signal u(n). Quantizer 4028 quantizes input 
signal u(n) to produce a quantized output signal uq(n) (or quantizer output 
signal uq(n)) associated with a quantization error or noise signal q(n). 
Combiner 4036 combines (differences) signals u(n) and uq(n) to produce the 
quantization noise signal q(n). Long-term filter 4038 long-term filters the 
noise signal q(n) to produce feedback noise signal fq(n). Therefore, combiner 
4036, long-term filter 4038 and combiner 4026 form an inner or second stage 
NF loop nested within the outer NF loop. This inner NF loop spectrally shapes 
the coding noise associated with codec 4000 in accordance with filter 4038, to 
follow, for example, the long-term spectral characteristics of input speech 
signal s(n). 

[0115] Exiting quantizer 4028, combiner 4030 combines quantizer output 

signal uq(n) with a prediction pv(n)' of predictive quantizer input signal v(n). 
Long-term predictor 4034 long-term predicts signal v(n) (to produce predicted 
signal pv(n)') based on signal vq(n). 

[0116] Exiting predictive quantizer Q" (4008), predictively quantized signal 

vq(n) is combined v^th a prediction ps(n)' of input speech signal s(n) to 
produce reconstructed speech signal sq(n). Predictor 4012 short term predicts 
input speech signal s(n) (to produce predicted signal ps(n)') based on 
reconstructed speech signal sq(n). 

[0117] In the first exemplary arrangement of NF codec 4000 depicted in 

FIG. 4, predictors 4002 and 4012 are short-term predictors and NF filter 4016 
is a short-term noise filter, while predictors 4022, 4034 are long-term 
predictors and noise filter 4038 is a long-term noise filter. In a second 
exemplary arrangement of NF codec 4000, predictors 4002, 4012 are long- 
term predictors and NF filter 4016 is a long-term noise filter (to spectrally 
shape the coding noise to follow, for example, the long-term characteristic of 
the input speech signal s(n)), while predictors 4022, 4034 are short-term 
predictors and noise filter 4038 is a short-term noise filter (to spectrally shape 
the coding noise to follow, for example, the short-term characteristic of the 
input speech signal s(n)). 
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[0118] In the first arrangement of codec 4000 depicted in FIG. 4, the dashed 

box labeled as (predictive filter Q" (4008)) contains an NFC codec 
structure just like the structure of codec 1000 in FIG. 1, but the predictors 
4022, 4034 and noise feedback filter 4038 are all long-term filters. Therefore, 
the quantization error qs(n) of the "predictive quantizer" Q" (4008) is simply 
the reconstruction error, or coding noise of the NFC structure inside the Q" 
dashed box 4008. Hence, from earlier equation, we have 

Thus, the z-transforai of the overall coding noise of codec 4000 in FIG. 4 is 

^ ^ ^ ^ ^ \-Ps{z)^ ^ ^ \\-Ps{z)\l\-Pl{z)r^ ^ 

This proves that the nested two-stage NFC codec structure 4000 in FIG. 4 
indeed performs both short-term and long-term noise spectral shaping, in 
addition to short-term and long-term prediction. 
[0119] One advantage of nested two-stage NFC structure 4000 as shown in 

FIG. 4 is that it completely decouples long-term noise feedback coding from 
short-term noise feedback coding. This allows us to use different codec 
structures for long-term NFC and short-term NFC, as the following examples 
illustrate. 



3. Fifth Codec Embodiment - Two Stage Prediction With 
Two Stage Noise Feedback (Nested Two Stage 
Feedback Coding) 

[0120] Due to the above mentioned "decoupling" between the long-term and 

short-term noise feedback coding, predictive quantizer Q" (4008) of codec 
4000 in FIG. 4 can be replaced by codec 2000 in FIG. 2, thus constructing 
another example nested two-stage NFC structure 5000, depicted in FIG. 5 and 
described below. 

[0121] FIG. 5 is a block diagram of a first exemplary arrangement of the 

example nested two-stage NFC structure or codec 5000, according to a fifth 
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embodiment of the present invention. Codec 5000 includes the following 
functional elements: a first short-term predictor 5002 (also referred to as a 
short-term predictor Ps(z)); a first combiner or adder 5004; a second combiner 
or adder 5006; a predictive quantizer 5008 (also referred to as a predictive 
quantizer Q'"); a third combiner or adder 5010; a second short-term predictor 
5012 (also referred to as a short-term predictor Ps(z)); a fourth combiner 5014; 
and a short-term noise feedback filter 5016 (also referred to as a short-term 
noise feedback filter Fs(z)). 

[0122] Predictive quantizer Q'" (5008) includes a first combiner 5024, a 

second combiner 5026, either a scalar or a vector quantizer 5028, a third 
combiner 5030, a long-term predictor 5034 (also referred to as a long-term 
predictor (Pl(z)), a fourth combiner 5036, and a long-term filter 5038 (also 
referred to as a long-term filter Nl(z)-l). 

[0123] Codec 5000 encodes a sampled input speech signal s(n) to produce a 

coded speech signal, and then decodes the coded speech signal to produce a 
reconstructed output speech signal sq(n), representative of the input speech 
signal s(n). Reconstructed speech signal sq(n) is associated with an overall 
coding noise r(n) = s(n) - sq(n). In coding input speech signal s(n), predictors 
5002 and 5012, combiners 5004, 5006, and 5010, and noise filter 5016 operate 
similarly to corresponding elements described above in connection with 
FIG. 3 having reference numerals decreased by "2000". Therefore, NF codec 
5000 includes an outer or first stage NF loop comprising combiner 5014, 
short-term noise filter 5016, and combiner 5006. This outer NF loop spectrally 
shapes the coding noise associated with codec 5000 according to filter 5016, 
to follow, for example, the short-term spectral characteristics of input speech 
signal s(n). 

[0124] Predictive quantizer 5008 has a structure similar to the structure of NF 

codec 2000 described above in connection with FIG. 2. Predictive quantizer 
Q'" (5008) operates within the outer NF loop mentioned above to predictively 
quantize a predictive quantizer input signal v(n) to produce a predictively 
quantized output signal vq(n) (also referred to as predicted quantizer output 



III ni I ■ fiiiji in 



-31 - 



signal vq(n)) in the following exemplary manner. Predictor 5034 long-term 
predicts input signal v(n) based on output signal vq(n), to produce a predicted 
signal pv(n) (i.e., representing a prediction of signal v(n)). Combiners 5026 
and 5024 collectively combine signal pv(n) with a noise feedback signal fq(n) 
and with input signal v(n) to produce a quantizer input signal u(n). Quantizer 
5028 quantizes input signal u(n) to produce a quantized output signal uq(n) 
(also referred to as a quantizer output signal uq(n)) associated with a 
quantization error or noise signal q(n). Combiner 5036 combines (i.e., 
differences) signals u(n) and uq(n) to produce the quantization noise signal 
q(n). Filter 5038 long-term filters the noise signal q(n) to produce feedback 
noise signal fq(n). Therefore, combiner 5036, long-term filter 5038 and 
combiners 5026 and 5024 form an inner or second stage NF loop nested 
within the outer NF loop. This inner NF loop spectrally shapes the coding 
noise associated with codec 5000 in accordance with filter 5038, to follow, for 
example, the long-term spectral characteristics of input speech signal s(n). 

[0125] In a second exemplary arrangement of NF codec 5000, predictors 

5002, 5012 are long-term predictors and NF filter 5016 is a long-term noise 
filter (to spectrally shape the coding noise to follow, for example, the long- 
term characteristic of the input speech signal s(n)), while predictor 5034 is a 
short-term predictor and noise filter 5038 is a short-term noise filter (to 
spectrally shape the coding noise to follow, for example, the short-term 
characteristic of the input speech signal s(n)), 

[0126] FIG. 5 A is a block diagram of an alternative but mathematically 

equivalent signal combining arrangement 5050 corresponding to the 
combining arrangement including combiners 5024 and 5026 of FIG. 5. 
Combining arrangement 5050 includes a first combiner 5024' and a second 
combiner 5026'. Combiner 5024' receives predictive quantizer input signal 
v(n) and predicted signal pv(n) directly from predictor 5034. Combiner 5024' 
combines these two signals to produce an intermediate signal i(n)'. Combiner 
5026' receives intermediate signal i(n)' and feedback noise signal fq(n) 
directly from noise filter 5038. Combiner 5026' combines these two received 
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signals to produce quantizer input signal u(n). Therefore, equivalent 
combining arrangement 5050 is similar to the combining arrangement 
including combiners 5024 and 5026 of FIG. 5. 

4. Sixth Codec Embodiment - Two Stage Prediction With 
Two Stage Noise Feedback (Nested Two Stage 
Feedback Coding) 

[0127] In a further example, the outer layer NFC structure in FIG. 5 (i.e., all 

of the functional blocks outside of predictive quantizer Q'" (5008)) can be 
replaced by the NFC structure 2000 in FIG. 2, thereby constructing a further 
codec structure 6000, depicted in FIG. 6 and described below. 

[0128] FIG. 6 is a block diagram of a first exemplary arrangement of the 

example nested two-stage NF coding structure or codec 6000, according to a 
sixth embodiment of the present invention. Codec 6000 includes the 
following functional elements: a first combiner 6004; a second combiner 
6006; predictive quantizer Q'" (5008) described above in connection with 
FIG. 5; a third combiner or adder 6010; a short-term predictor 6012 (also 
referred to as a short-term predictor Ps(z)); a fourth combiner 6014; and a 
short-term noise feedback filter 6016 (also referred to as a short-term noise 
feedback filter Ns(z)-l). 

[0129] Codec 6000 encodes a sampled input speech signal s(n) to produce a 

coded speech signal, and then decodes the coded speech signal to produce a 
reconstructed output speech signal sq(n), representative of the input speech 
signal s(n). Reconstructed speech signal sq(n) is associated with an overall 
coding noise r(n) = s(n) - sq(n). In coding input speech signal s(n), an outer 
coding structure depicted in FIG. 6, including combiners 6004, 6006, and 
6010, noise filter 6016, and predictor 6012, operates in a manner similar to 
corresponding codec elements of codec 2000 described above in connection 
with FIG. 2 having reference numbers decreased by "4000." A combining 
arrangement including combiners 6004 and 6006 can be replaced by an 
equivalent combining arrangement similar to combining arrangement 5050 
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discussed in connection with FIG. 5 A, whereby a combiner 6004' (not shown) 
combines signals s(n) and ps(n)' to produce a residual signal d(n) (not shown), 
and then a combiner 6006' (also not shown) combines signals d(n) and fqs(n) 
to produce signal v(n). 

[0130] Unlike codec 2000, codec 6000 includes a predictive quantizer 

equivalent to predictive quantizer 5008 (described above in connection with 
FIG. 5, and depicted in FIG, 6 for descriptive convenience) to predictively 
quantize a predictive quantizer input signal v(n) to produce a quantized output 
signal vq(n). Accordingly, codec 6000 also includes a first stage or outer 
noise feedback loop to spectrally shape the coding noise to follow, for 
example, the short-term characteristic of the input speech signal s(n), and a 
second stage or inner noise feedback loop nested within the outer loop to 
spectrally shape the coding noise to follow, for example, the long-term 
characteristic of the input speech signal. 

[0131] In a second exemplary arrangement of NF codec 6000, predictor 6012 

is a long-term predictor and NF filter 6016 is a long-term noise filter, while 
predictor 5034 is a short-term predictor and noise filter 5038 is a short-term 
noise filter. 

[0132] There is an advantage for such a flexibility to mix and match different 

single-stage NFC structures in different parts of the nested two-stage NFC 
structure. For example, although the codec 5000 in FIG. 5 mixes two different 
types of single-stage NFC structures in the two nested layers, it is actually the 
preferred embodiment of the current invention, because it has the lowest 
complexity among the three systems 4000, 5000, and 6000, respectively 
shown in FIGs. 4, 5 and 6. 

[0133] To see the codec 5000 in FIG. 5 has the lowest complexity, consider 

the inner layer involving long-term NFC first. To get better long-term 
prediction performance, we normally use a three-tap pitch predictor of the 
kind used by Atal and Schroeder in their 1979 paper, rather than a simpler 
one-tap pitch predictor. With Fl(z) = Pl(z/p), the long-term NFC structure 
inside the Q" dashed box has three long-term filters, each with three taps. In 
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contract, by choosing the harmonic noise spectral shape to be the same as the 
frequency response of 

we have only a three-tap filter Pl(z) (5034) and a one-tap filter 
(5038) N(z) -I ^ Az~^ in the long-terai NFC structure inside the Q'" dashed 
box (5008) of FIG. 5. Therefore, the inner layer Q'" (5008) of FIG. 5 has a 
lower complexity than the inner layer Q" (4008) of FIG. 4. 
[0134] Now consider the short-term NFC structxxre in the outer layer of codec 

5000 in Fig 5. The short-term synthesis filter (including predictor 5012) to the 
right of the Q'" dashed box (5008) does not need to be implemented in the 
encoder (and all three decoders corresponding to FIGs. 4-6 need to implement 
it). The short-term analysis filter (including predictor 5002) to the left of the 
symbol d(n) needs to be implemented anyway even in FIG. 6 (gdthough not 
shown there), because we axe using d(n) to derive a weighted speech signal, 
which is then used for pitch estimation. Therefore, comparing the rest of the 
outer layer, FIG. 5 has only one short-term filter Fs(z) (5016) to implement, 
while FIG. 6 has two short-term filters. Thus, the outer layer of FIG. 5 has a 
lower complexity than the outer layer of FIG, 6. 

5. Coding Method 

[0135] FIG. 6A is an example method 6050 of coding a speech or audio signal 

using any one of the example codecs 3000, 4000, 5000, and 6000 described 
above. In a first step 6055, a predictor (e.g., 3002 in FIG. 3, 4002 in FIG. 4, 
5002 in FIG. 5 , or 6012 in FIG. 6) predicts an input speech or audio signal 
(e.g., s(n)) to produce a predicted speech signal (e.g., ps(n) or ps(n)'). 

[0136] In a next step 6060, a combiner (e.g., 3004, 4004, 5004, 6004/6006 or 

equivalents thereof) combines the predicted speech signal (e.g., ps(n)) with the 
speech signal (e.g., s(n)) to produce a first residual signal (e.g., d(n)). 

[0137] In a next step 6062, a combiner (e.g., 3006, 4006, 5006, 6004/6006 or 

equivalents thereof) combines a first noise feedback signal (e.g., fqs(n)) with 
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the first residual signal (e.g., d(n)) to produce a predictive quantizer input 
signal (e.g., v(n)). 

[0138] In a next step 6064, a predictive quantizer (e.g., Q', Q", or Q'") 

predictively quantizes the predictive quantizer input signal (e.g., v(n)) to 
produce a predictive quantizer output signal (e.g., vq(n)) associated with a 
predictive quantization noise (e.g., qs(n)). 

[0139] In a next step 6066, a filter (e.g., 3016, 4016, or 5016) filters the 

predictive quantization noise (e.g., qs(n)) to produce the first noise feedback 
signal (e.g., fqs(n)). 

[0140] FIG. 6B is a detailed method corresponding to predictive quantizing 

step 6064 described above. In a first step 6070, a predictor (e.g., 3034, 4022, 
or 5034) predicts the predictive quantizer input signal (e.g., v(n)) to produce a 
predicted predictive quantizer input signal (e.g., pv(n)). 

[0141] In a next step 6072 used in all of the codecs 3000-6000, a combiner 

(e.g., 3024, 4024, 5024/5026 or an equivalent thereof, such as 5024') 
combines at least the predictive quantizer input signal (e.g., v(n)) with at least 
the first predicted predictive quantizer input signal (e.g., pv(n)) to produce a 
quantizer input signal (e.g., u(n)). 

[0142] Additionally, the codec embodiments including an inner noise 

feedback loop (that is, exemplary codecs 4000, 5000, and 6000) use further 
combining logic (e.g., combiners 5026/5026' or 4026 or equivalents thereof)) 
to further combine a second noise feedback signal (e.g., fq(n)) with the 
predictive quantizer input signal (e.g., v(n)) and the first predicted predictive 
quantizer input signal (e.g., pv(n)), to produce the quantizer input signal (e.g., 
u(n)). 

[0143] In a next step 6076, a scalar or vector quantizer (e.g., 3028, 4028, or 

5028) quantizes the input signal (e.g., u(n)) to produce a quantizer output 
signal (e.g., uq(n)). 

[0144] In a next step 6078 applying only to those embodiments including the 

inner noise feedback loop, a filter (e.g., 4038 or 5038) filters a quantization 
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noise (e.g., q(n)) associated with the quantizer output signal (e.g., q(n)) to 
produce the second noise feedback signal (fq(n)). 
[0145] In a next step 6080, deriving logic (e.g., 3034 and 3030 in FIG. 3, 4034 

and 4030 in FIG. 4, and 5034 and 5030 in FIG. 5) derives the predictive 
quantizer output signal (e.g., vq(n)) based on the quantizer output signal (e.g., 
uq(n)). 

III. Overview of Preferred Embodiment (Based on the Fifth 
Embodiment above) 

[0146] We now describe our preferred embodiment of the present invention. 

FIG. 7 shows an example encoder 7000 of the preferred embodiment. FIG. 8 
shows the corresponding decoder. As can be seen, the encoder structure 7000 
in FIG. 7 is based on the structure of codec 5000 in FIG. 5. The short-term 
synthesis filter (including predictor 5012) in FIG. 5 does not need to be 
implemented in FIG. 7, since its output is not used by encoder 7000. 
Compared with FIG. 5, only three additional functional blocks (10, 20, and 95) 
are added near the top of FIG. 7. These functional blocks (also singularly and 
collectively referred to as "parameter deriving logic") adaptively analyze and 
quantize (and thereby derive) the coefficients of the short-term and long-term 
filters. FIG, 7 also explicitly shows the different quantizer indices that are 
multiplexed for transmission to the communication channel. The decoder in 
FIG, 8 is essentially the same as the decoder of most other modern predictive 
codecs such as MPLPC and CELP. No postfilter is used in the decoder. 

[0147] Coder 7000 and coder 5000 of FIG. 5 have the following 

corresponding functional blocks: predictors 5002 and 5034 in FIG. 5 
respectively correspond to predictors 40 and 60 in FIG. 7; combiners 5004, 
5006, 5014, 5024, 5026, 5030 and 5036 in FIG. 5 respectively correspond to 
combiners 45, 55, 90, 75, 70, 85 and 80 in FIG. 7; filters 5016 and 5038 in 
FIG, 5 respectively correspond to filters 50 and 65 in FIG. 7; quantizer 5028 in 
FIG. 5 corresponds to quantizer 30 in FIG. 7; signals vq(n), pv(n), fqs(n), and 
fq(n) in FIG. 5 respectively correspond to signals dq(n), ppv(n), stnf(n), and 
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ltnf(n) in FIG. 7; signals sharing the same reference labels in FIG.5 and FIG. 7 
also correspond to each other. Accordingly, the operation of codec 5000 
described above in connection with FIG. 5 correspondingly applies to codec 
7000 of FIG. 7. 

IV. Short-Term Linear Predictive Analysis and Quantization 

[0148] We now give a detailed description of the encoder operations. Refer to 

FIG. 7. The input signal s(n) is buffered at block 10, which performs short- 
term linear predictive analysis and quantization to obtain the coefficients for 
the short-term predictor 40 and the short-term noise feedback filter 50. This 
block 10 is further expanded in FIG. 9. The processing blocks within FIG. 9 
all employ well-known prior-art techniques. 

[0149] Refer to FIG. 9. The input signal s(n) is buffered at block 11, where it 

is multiplied by an analysis window that is 20 ms in length. If the coding 
delay is not critical, then a frame size of 20 ms and a sub-frame size of 5 ms 
can be used, and the analysis window can be a symmetric window centered at 
the mid-point of the last sub-frame in the current frame. In our preferred 
embodiment of the codec, however, we want the coding delay to be as small as 
possible; therefore, the frame size and the sub-frame size are both selected to 
be 5 ms, and no look ahead is allowed beyond the current frame. In this case, 
an asymmetric window is used. The "left window" is 17.5 ms long, and the 
"right window" is 2.5 ms long. The two parts of the window concatenate to 
give a total window length of 20 ms. Let LWINSZ be the number of samples in 
the left window {LWINSZ = 140 for 8 kHz sampling and 280 for 16 kHz 
sampling), then the left window is given by 



RWINSZ = 20 for 8 kHz sampling and 40 for 16 kHz sampling. The right 
window is given by 




[0150] 



Let RWINSZ be the number of samples in the right window. Then, 
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wr(n) = cos 



1,2, ..,,RWINSZ. 



^IRWINSZ, 

[0151] The concatenation of wl(n) and wr(n) gives the 20 ms asymmetric 

analysis window. When applying this analysis window, the last sample of the 
window is lined up with the last sample of the current frame, so there is no 
look ahead. 

[0152] After the 5 ms current frame of input signal and the preceding 15 ms of 

input signal in the previous three frames are multipUed by the 20 ms window, 
the resulting signal is used to calculate the autocorrelation coefficients r(i)^ for 
lags ? = 0, 1, 2, . . M, where Mis the short-term predictor order, and is chosen 
to be 8 for both 8 kHz and 16 kHz sampled signals. 

[0153] The calculated autocorrelation coefficients are passed to block 12, 

which applies a Gaussian window to the autocorrelation coefficients to 
perform the well-known prior-art method of spectral smoothing. The 
Gaussian window fiinction is given by 

gw(i) = e ^ ,/ = 0, 1,2, ...,M, 
where is the sampling rate of the input signal, expressed in Hz, and a is 
40 Hz. 

[0154] After multiplying r(i) by such a Gaussian window, block 12 then 

multiplies r(0) by a white noise correction factor of WNCF = 1 + s , where s = 
0.0001. In summary, the output of block 12 is given by 

f(l + ^)r(0), / = 0 



rii) = 



[0155] The spectral smoothing technique smoothes out (widens) sharp 

resonance peaks in the frequency response of the short-term synthesis filter. 
The white noise correction adds a white noise floor to limit the spectral 
dynamic range. Both techniques help to reduce ill conditioning in the 
Levinson-Durbin recursion of block 13. 
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[0156] Block 13 takes the autocorrelation coefficients modified by block 12, 

and performs the well-known prior-art method of Levinson-Durbin recursion 
to convert the autocorrelation coefficients to the short-term predictor 
coefficients <3, , / = 0, 1, M Block 14 performs bandwidth expansion of 
the resonance spectral peaks by modifying as 

for z = 0, 1, . . M. In our particular implementation, the parameter y is chosen 
as 0.96852. 

[0157] Block 15 converts the {a,} coefficients to Line Spectrum Pair (LSP) 

coefficients {/, }, which are sometimes also referred to as Line Spectrum 
Frequencies (LSFs). Again, the operation of block 15 is a well-known prior- 
art procedure. 

[0158] Block 16 quantizes and encodes the M LSP coefficients to a pre- 

determined number of bits. The output LSP quantizer index array LSPI is 
passed to the bit multiplexer (block 95), while the quantized LSP coefficients 
are passed to block 17. Many different kinds of LSP quantizers can be used in 
block 16. In our preferred embodiment, the quantization of LSP is based on 
inter-frame moving-average (MA) prediction and multi-stage vector 
quantization, similar to (but not the same as) the LSP quantizer used in the 
ITU-T Recommendation G.729. 

[0159] Block 16 is further expanded in FIG. 10. Except for the LSP quantizer 

index array LSPI, all other signal paths in FIG. 10 are for vectors of dimension 
M. Block 161 uses the unquantized LSP coefficient vector to calculate the 
weights to be used later in VQ codebook search with weighted mean-square 
error (WMSE) distortion criterion. The weights are determined as 

l/min(/,-/,_p/,,, -O, \<i<M . 
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[0160] Basically, the z-th weight is the inverse of the distance between the /-th 

LSP coefficient and its nearest neighbor LSP coefficient. These weights are 
different fi-om those used in G.729. 

[0161] Block 162 stores the long-term mean value of each of the M LSP 

coefficients, calculated off-line during codec design phase using a large 
training data file. Adder 163 subtracts the LSP mean vector from the 
unquantized LSP coefficient vector to get the mean-removed version of it. 
Block 164 is the inter-frame MA predictor for the LSP vector. In our 
preferred embodiment, the order of this MA predictor is 8, The 8 predictor 
coefficients are fixed and pre-designed off-line using a large training data file. 
With a frame size of 5 ms, this 8^^-order predictor covers a time span of 40 ms, 
the same as the time span covered by the 4^^-order MA predictor of LSP used 
in G,729, which has a frame size of 10 ms. 

[0162] Block 164 multiplies the 8 output vectors of the vector quantizer block 

166 in the previous 8 frames by the 8 sets of 8 fixed MA predictor coefficients 
and sum up the result. The resulting weighted sum is the predicted vector, 
which is subtracted from the mean-removed unquantized LSP vector by adder 
165. The two-stage vector quantizer block 166 then quantizes the resulting 
prediction error vector. 

[0163] The first-stage VQ inside block 166 uses a 7-bit codebook (128 

codevectors). For the narrowband (8 kHz sampling) codec at 16 kb/s, the 
second-stage VQ also uses a 7-bit codebook. This gives a total encoding rate 
of 14 bits/frame for the 8 LSP coefficients of the 16 kb/s narrowband codec. 
For the wideband (16 kHz sampling) codec at 32 kb/s, on the other hand, the 
second-stage VQ is a split VQ with a 3-5 split. The first three elements of the 
error vector of first-stage VQ are vector quantized using a 5 -bit codebook, and 
the remaining 5 elements are vector quantized using another 5 -bit codebook. 
This gives a total of (7+5+5)= 17 bits/frame encoding rate for the 8 LSP 
coefficients of the 32 kb/s wideband codec. The selected codevectors from the 
two VQ stages are added together to give the final output quantized vector of 
block 166. 
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[0164] During codebook searches, both stages of VQ within block 166 use the 

WMSE distortion measure with the weights {w, } calculated by block 161. 
The codebook indices for the best matches in the two VQ stages (two indices 
for 16 kb/s narrowband codec and three indices for 32 kb/s wideband codec) 
form the output LSP index array LSPI, which is passed to the bit multiplexer 
block 95 in FIG. 7. 

[0165] The output vector of block 166 is used to update the memory of the 

inter-frame LSP predictor block 164. The predicted vector generated by block 
164 and the LSP mean vector held by block 162 are added to the output vector 
of block 166, by adders 167 and 168, respectively. The output of adder 168 is 
the quantized and mean-restored LSP vector. 

[0166] It is well known in the art that the LSP coefficients need to be in a 

monotonically ascending order for the resulting synthesis filter to be stable. 
The quantization performed in FIG. 10 may occasionally reverse the order of 
some of the adjacent LSP coefficients. Block 169 check for correct ordering 
in the quantized LSP coefficients, and restore correct ordering if necessary. 

The output of block 169 is the final set of quantized LSP coefficients { }. 

[0167] Now refer back to FIG. 9, The quantized set of LSP coefficients {X}, 

which is determined once a frame, is used by block 17 to perform linear 
interpolation of LSP coefficients for each sub-frame within the current frame. 
In a general coding scheme based on the current invention, there may be two 
or more sub-frames per frame. For example, the sub-frame size can stay at 5 
ms, while the frame size can be 10 ms or 20 ms. In this case, the linear 
interpolation of LSP coefficients is a well-known prior art. In the preferred 
embodiment of the current invention, to keep the coding delay low, the frame 
size is chosen to be 5 ms, the same as the sub-frame size. In this degenerate 
case, block 17 can be omitted. This is why it is shown in dashed box. 

[0168] Block 18 takes the set of interpolated LSP coefficients {/;} and 

converts it to the corresponding set of direct-form linear predictor coefficients 
{ a, } for each sub-frame. Again, such a conversion firom LSP coefficients to 
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predictor coefficients is well known in the art. The resuhing set of predictor 
coefficients {a,} sltg used to update the coefficients of the short-term predictor 
block 40 in FIG. 7. 

[0169] Block 19 performs fiirther bandwidth expansion on the set of predictor 

coefficients {a^} using a bandwidth expansion factor of = 0.75. The 
resulting bandwidth-expanded set of filter coefficients is given by 

a;=//a, , for i = 0, 1,2, ...,M 
[0170] This bandwidth-expanded set of filter coefficients { a' } are used to 

update the coefficients of the short-term noise feedback filter block 50 in FIG. 
7 and the coefficients of the weighted short-term synthesis filter block 21 in 
FIG. 1 1 (to be discussed later). This completes the description of short-term 
predictive analysis and quantization block 10 in FIG. 7. 

V. Short-Term Linear Prediction of Input Signal 

[0171] Now refer to FIG. 7 again. Except for block 10 and block 95, whose 

operations are performed once a frame, the operations of most of the rest of 
the blocks in FIG. 7 are performed once a sub-frame, unless otherwise noted. 
The short-term predictor block 40 predicts the input signal sample s(n) based 
on a linear combination of the preceding M samples. The adder 45 subtracts 
the resulting predicted value from s(n) to obtain the short-term prediction 
residual signal, or the difference signal, d(n). Specifically, 

M 

d(n) = s(n)-^a^s{n-i) . 
/=i 

VI. Long-Term Linear Predictive Analysis and Quantization 

[0172] The long-term predictive analysis and quantization block 20 uses the 

short-term prediction residual signal {d(n)} of the current sub-frame and its 
quantized version {dq(n)} in the previous sub-frames to determine the 
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quantized values of the pitch period and the pitch predictor taps. This block 
20 is further expanded in FIG. 1 1 . 
[0173] Now refer to FIG. 11. The short-term prediction residual signal d(n) 

passes through the weighted short-term synthesis filter block 21, whose output 
is calciilated as 

dw(n) = d(n) + ^ aldw(n - i) 
f=\ 

[0174] The signal dw(n) is basically a perceptually weighted version of the 

input signal s(n), just like what is done in CELP codecs. This dw(n) signal is 
passed through a low-pass filter block 22, which has a -3 dB cut off frequency 
at about 800 Hz. In the preferred embodiment, a 4^^-order elliptic filter is used 
for this purpose. Block 23 down-samples the low-pass filtered signal to a 
sampling rate of 2 kHz. This represents a 4:1 decimation for the 16 kb/s 
narrowband codec or 8:1 decimation for the 32 kb/s wideband codec. 

[0175] The first-stage pitch search block 24 then uses the decimated 2 kHz 

sampled signal dwd(n) to find a "coarse pitch period", denoted as cpp in 
FIG. 11. A pitch analysis window of 10 ms is used. The end of the pitch 
analysis window is lined up with the end of the current sub-frame. At a 
sampling rate of 2 kHz, 10 ms correspond to 20 samples. Without loss of 
generality, let the index range of w = 1 to « = 20 correspond to the pitch 
analysis window for dwd(n). Block 24 first calculates the following 
correlation function and energy values 

20 

c{k) = ^ dwd(n)dwd(n - k) 

20 

E(k) = Y,dwd(n-kf 

for k = MINPPD -\iok = MAXPPD 1, where MINPPD and MAXPPD are 
the minimum and maximum pitch period in the decimated domain, 
respectively. 

[0176] For the narrowband codec, MINPPD = 4 samples and MAXPPD = 36 

samples. For the wideband codec, MINPPD = 2 samples and MAXPPD = 34 
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samples. Block 24 then searches through the calculated {c(k)} array and 
identifies all positive local peaks in the {c(k)} sequence. Let denote the 
resulting set of indices A:^ where c{k^) is a positive local peak, and let the 

elements in be arranged in an ascending order. 

[0177] If there is no positive local peak at all in the {c(k)} sequence, the 

processing of block 24 is terminated and the output coarse pitch period is set 
to cpp = MINPPD, If there is at least one positive local peak, then the block 
24 searches through the indices in the set and identifies the index that 

maximizes c{k^f I E{kp) . Let the resulting index be A:* . 
[0178] To avoid picking a coarse pitch period that is aroimd an integer 

multiple of the true coarse pitch period, the following simple decision logic is 
used. 

1 . If A:* corresponds to the first positive local peak (i.e. it is the 

first element of ), use k^^ as the final output cpp of block 24 and skip the 
rest of the steps. 

2. Otherwise, go from the first element of ^to the element of 
that is just before the element , find the first k^^ in that satisfies 
c{k^f IE{k^)> T,[c{klf IE{kl)] , where T,^ 0,7. The first A:^that satisfies 
this condition is the final output cpp of block 24. 

3. If none of the elements of before A:* satisfies the inequality 
in 2. above, find the first k^ in that satisfies the following two conditions: 

c{k^ f I E(k^, ) > [c(kl f I E(kl )] , where 7; = 0.39, and 

I k^ - cpp' \< T.cpp' , where = 0.25, and cpp' is the block 24 

output cpp for the last sub-frame. 

The first A:^that satisfies these two conditions is the final output cpp of block 

24. 
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4. If none of the elements of before A:^ satisfies the 

inequalities in 3, above, then use ^* as the final output cpp of block 24. 
[0179] Block 25 takes cpp as its input and performs a second-stage pitch 

period search in the undecimated signal domain to get a refined pitch period 
pp. Block 25 first converts the coarse pitch period cpp to the undecimated 
signal domain by multiplying it by the decimation factor DECF, (This 
decimation factor DECF = 4 and 8 for narrowband and w^ideband codecs, 
respectively). Then, it determines a search range for the refined pitch period 
around the value cpp*DECF. The lower bound of the search range is lb = 
max(MINPP, cpp^DECF - DECF + 1) , where MINPP = 17 samples is the 
minimum pitch period. The upper bound of the search range is ub = 
mmiMAXPP, cpp^DECF + DECF - 1\ where MAXPP is the maximum pitch 
period, which is 144 and 272 samples for narrowband and v^deband codecs, 
respectively, 

[0180] Block 25 maintains a signal buffer with a total of MAXPP + 1 + 

SFRSZ samples, where SFRSZ is the sub-frame size, which is 40 and 80 
samples for narrowband and wideband codecs, respectively. The last SFRSZ 
samples of this buffer are populated with the open-loop short-term prediction 
residual signal d(n) in the current sub-frame. The first MAXPP + 1 samples 
are populated with the MAXPP + 1 samples of quantized version of d(n), 
denoted as dq(n), immediately preceding the current sub-frame. For 
convenience of equation writing later, we will use dq(n) to denote the entire 
buffer of MAXPP + 1 + SFRSZ samples, even though the last SFRSZ samples 
are really d(n) samples. Again, without loss of generality, let the index range 
fi:om n=\\on = SFRSZ denotes the samples in the current sub-firame. 

[0181] After the lower bound lb and upper bound ub of the pitch period search 

range are determined, block 25 calculates the following correlation and energy 
terms in the undecimated dq(n) signal domain for time lags k within the search 
range \lb^ ub]. 
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SFRSZ 



^(^)- X dq{n)dq{n- k) 

SFRSZ 



[0182] The time lag ke[lb,ub] that maximizes the ratio c^(k)/E{k)is 

chosen as the final refined pitch period. That is, 

Jc\k) 
pp = max ^ • 
k^yhMh] ^ E{k) 

[0183] Once the refined pitch period pp is determined, it is encoded into the 

corresponding output pitch period index PPI, calculated as 

PPI^pp-\l 

[0184] Possible values of PPI axe 0 to 127 for the narrowband codec and 0 to 

255 for the wideband codec. Therefore, the refined pitch period pp is encoded 
into 7 bits or 8 bits, without any distortion. 

[0185] Block 25 also calculates pptl, the optimal tap weight for a single-tap 

pitch predictor, as follows 

c(pp) 
ppt\ = ^^^^ . 
E{pp) 

[0186] Block 27 calculates the long-term noise feedback filter coefficient X as 

follows. 



LTWF, ppt\>\ 
LTWF'^pptl 0<ppn<l 
0 pptl < 0 



[0187] Pitch predictor taps quantizer block 26 quantizes the three pitch 

predictor taps to 5 bits using vector quantization. Rather than minimizing the 
mean-square error of the three taps as in conventional VQ codebook search, 
block 26 finds from the VQ codebook the set of candidate pitch predictor taps 
that minimizes the pitch prediction residual energy in the current sub-frame. 
Using the same dq(n) buffer and time index convention as in block 25, and 
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denoting the set of three taps correspondmg to the j-th codevector as 
{ &,i,*/2'^/3 }' express such pitch prediction residual energy as 



SFRSZ 



[0188] This equation can be re-written as 

SFRSZ 

E,= Y.dq\n)-p''x, , 

where 

SFRSZ 

V, = ^ dq(n)dq(n - pp + 2-i) , 

and 

SI^RSZ 

= ^dq(n-pp + 2-i)dq(n-pp + 2-j) . 

[0189] In the codec design stage, the optimal three-tap codebooks 

{b^^,b^2,b^^},j = 0, 1, 2, 31 are designed off-line. The corresponding 9- 
dimensional codevectors x^,j = 0, 1,2, ..,,31 are calculated and stored in a 
codebook. In actual encoding, block 26 first calculates the vector p^' , then it 
calculates the 32 inner products p^^x^ for y = 0, 1, 2, ... , 31. The codebook 
index j* that maximizes such an inner product also minimizes the pitch 
prediction residual energy . Thus, the output pitch predictor taps index 
PPTI is chosen as 

PPTI = 7* = max"^(yxJ . 

[0190] The corresponding vector of three quantized pitch predictor taps, 

denoted as ppt in FIG. 11, is obtained by multiplying the first three elements 
of the selected codevector x^* by 0.5. 
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[0191] Once the quantized pitch predictor taps have been determined, block 

28 calculates the open-loop pitch prediction residual signal e(n) as follows. 

3 

e{n) = dqin) - ^ b^,^dq(n - pp + 2~i) 
i=\ 

[0192] Again, the same dq(n) buffer and time index convention of block 25 is 

used here. That is, the current sub-frame of dq(n) for « = 1, 2, SFRSZ is 
actually the unquantized open-loop short-term prediction residual signal d(n). 

[0193] This completes the description of block 20, long-term predictive 

analysis and quantization. 



VII . Quantization of Residual Gain 

[0194] The open-loop pitch prediction residtial signal e(n) is used to calculate 

the residual gain. This is done inside the prediction residual quantizer block 
30 in FIG. 7. Block 30 is further expanded in FIG. 12. 

[0195] Refer to FIG. 12. Block 301 calculates the residual gain in the base-2 

logarithmic domain. Let the current sub-frame corresponds to time indices 
from n = \ to n = SFRSZ. For the narrowband codec, the logarithmic gain 
(log-gain) is calculated once a sub-frame as 

^ SFRSZ 



Ig = log. 



SFRSZ ^ 

[0196] For the wideband codec, on the other hand, two log-gains are 

calculated for each sub-frame. The first log-gain is calculated as 

2 SFRSZ 12 



lg(l) = l0g: 



SFRSZ 

and the second log-gain is calculated as 

2 



lg(2) = log, 



SFRSZ 



SFRSZ f, ^ SFRSZ / 2+ 1 

[0197] Lacking a better name, we will use the term "gain frame" to refer to 

the time interval over which a residual gain is calculated. Thus, the gain frame 
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size is SFRSZ for the narrowband codec and SFRSZ/2 for the wideband codec. 
All the operations in FIG. 12 are done on a once-per-gain-frame basis. 

[0198] The long-term mean value of the log-gain is calculated off-line and 

stored in block 302, The adder 303 subtracts this long-term mean value from 
the output log-gain of block 301 to get the mean-removed version of the log- 
gain. The MA log-gain predictor block 304 is an FIR filter, with order 8 for 
the narrowband codec and order 16 for the wideband codec. In either case, the 
time span covered by the log-gain predictor is 40 ms. The coefficients of this 
log-gain predictor are pre-determined off-line and held fixed. The adder 305 
subtracts the output of block 304, which is the predicted log-gain, from the 
mean-removed log-gain. The scalar quantizer block 306 quantizes the 
resulting log-gain prediction residual. The narrowband codec uses a 4-bit 
quantizer, while the wideband codec uses a 5 -bit quantizer here. 

[01991 The gain quantizer codebook index GI is passed to the bit multiplexer 

block 95 of FIG. 7. The quantized version of the log-gain prediction residual 
is passed to block 304 to update the MA log-gain predictor memory. The 
adder 307 adds the predicted log-gain to the quantized log-gain prediction 
residual to get the quantized version of the mean-removed log-gain. The 
adder 308 then adds the log-gain mean value to get the quantized log-gain, 
denoted as qlg. 

[0200] Block 309 then converts the quantized log-gain to the quantized 

residual gain in the linear domain as follows: 

g = T'^^^ . 

[0201] Block 310 scales the residual quantizer codebook. That is, it multiplies 

all entries in the residual quantizer codebook by g. The resulting scaled 
codebook is then used by block 311 to perform residual quantizer codebook 
search. 

[0202] The prediction residual quantizer in the current invention of TSNFC 

can be either a scalar quantizer or a vector quantizer. At a given bit-rate, using 
a scalar quantizer gives a lower codec complexity at the expense of lower 
output quality. Conversely, using a vector quantizer improves the output 
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quality but gives a higher codec complexity. A scalar quantizer is a suitable 
choice for applications that demand very low codec complexity but can 
tolerate higher bit rates. For other applications that do not require very low 
codec complexity, a vector quantizer is more suitable since it gives better 
coding efficiency than a scalar quantizer. 
[0203] In the next two sections, we describe the prediction residual quantizer 

codebook search procedures in the current invention, first for the case of scalar 
quantization in SQ-TSNFC, and then for the case of vector quantization in 
VQ-TSNFC. The codebook search procedures are very different for the two 
cases, so they need to be described separately. 

VIIL Scalar Quantization of Linear Prediction Residual Signal 

[0204] If the residual quantizer is a scalar quantizer, the encoder structure of 

FIG. 7 is directly used as is, and blocks 50 through 90 operate on a sample-by- 
sample basis. Specifically, the short-term noise feedback filter block 50 of 
FIG. 7 uses its filter memory to calculate the current sample of the short-term 
noise feedback signal stnf(n) as follows. 



and the long-term noise feedback filter block 65 calculates the long-term noise 
feedback signal as 



M 



stnf{n) = Y^a]qs{n'i) 



[0205] 



[0206] 




ppv(n) = Y,bj.,dq(n-pp + 2-i) , 



Itnf (n) - Aq(n - pp) . 
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[0207] The adders 70 and 75 together calculates the quantizer input signal 

u(n) as 

u{n) - v{n) - \ppv{n) + Itnf (w)] . 

[0208] Next, Block 311 of FIG. 12 quantizes u(n) by simply performing the 

codebook search of a conventional scalar quantizer. It takes the current 
sample of the unquantized signal u(n), find the nearest neighbor from the 
scaled codebook provided by block 3 1 0, passes the corresponding codebook 
index CI to the bit multiplexer block 95 of FIG. 7, and passes the quantized 
value uq(n) to the adders 80 and 85 of FIG. 7. 

[0209] The adder 80 calculates the quantization error of the quantizer block 

30 as 

q{n) = u(n) — uq(n) . 

[0210] This q(n) sample is passed to block 65 to update the filter memory of 

the long-term noise feedback filter. 
[0211] The adder 85 adds ppv(n) to uq(n) to get dq(n), the quantized version 

of the current sample of the short-term prediction residual. 

dq{n) = uq{n) + ppv(n) 
[0212] This dq(n) sample is passed to block 60 to update the filter memory of 

the long-term predictor. 
[0213] The adder 90 calculates the current sample of qs(n) as 

qs{n) = v{n) - dq{n) 

and then passes it to block 50 to update the filter memory of the short-term 

noise feedback filter. This completes the sample-by-sample qixantization 

feedback loop. 

[0214] We found that for speech signals at least, if the prediction residual 

scalar quantizer operates at a bit rate of 2 bits/sample or higher, the 
corresponding SQ-TSNFC codec output has essentially transparent quality. 
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IX. Vector Quantization of Linear Prediction Residual Signal 

[0215] If the residual quantizer is a vector quantizer, the encoder structure of 

FIG. 7 cannot be used directly as is. An alternative approach and alternative 
structures need to be used. To see this, consider a conventional vector 
quantizer with a vector dimension K, Normally, an input vector is presented 
to the vector quantizer, and the vector quantizer searches through all 
codevectors in its codebook to find the nearest neighbor to the input vector. 
The winning codevector is the VQ output vector, and the corresponding 
address of that codevector is the quantizer out codebook index. If such a 
conventional VQ scheme is to be used with the codec structure in FIG. 7, then 
we need to determine K samples of the quantizer input u(n) at a time. 
Determining the first sample of u(n) in the VQ input vector is not a problem, 
as we have already shovm how to do that in the last section. However, the 
second through the AT-th samples of the VQ input vector cannot be determined, 
because they depend on the first through the {K - l)-th samples of the VQ 
output vector of the signal uq(n), which have not been determined yet. 

[0216] The present invention avoids this chicken-and-egg problem by 

modifying the VQ codebook search procedure, as described below beginning 
with reference to FIG. 13 A. 

A. General VQ Search 

1 . High-Level Embodiment 
a. System 

[0217] FIG. 13 A is a block diagram of an example Noise Feedback Coding 

(NFC) system 1300 for searching through N VQ codevectors, stored in a 
scaled VQ codebook 5028a, for a preferred one of the N VQ codevectors to be 
used for coding a speech or audio signal s(n). System 1300 includes scaled 
VQ codebook 5028a including a VQ codebook 1302 and a gain scaling unit 
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1304. Scaled VQ codebook 5028a corresponds to quantizer 3028, 4028, 5028, 
or 30, described above in connection with FIGs 3, 4, 5, or 7, respectively. 

[0218] VQ codebook 1302 includes N VQ codevectors. VQ codebook 1302 

provides each of the N VQ codevectors stored in the codebook to gain scaling 
unit 1304. Gain scaling unit 1304 scales the codevectors, and provides scaled 
codevectors to an output of scaled VQ codebook 5028a. Symbol g(n) 
represents the quantized residual gain in the linear domain, as calculated in 
previous sections. The combination of VQ codebook 1302 and gain scaling 
unit 1304 (also labeled g(n)) is equivalent to a scaled VQ codebook. 

[0219] System 1300 further includes predictor logic unit 1306 (also referred to 

as a predictor 1306), an input vector deriver 1308, an error energy calculator 
1310, a preferred codevector selector 1312, and a predictor/filter restorer 
1314. Predictor 1306 includes combining and predicting logic. Input vector 
deriver 1308 includes combining, filtering, and predicting logic, 
corresponding to such logic used in codecs 3000, 4000, 5000, 6000, and 7000, 
for example, as will be further described below. The logic used in predictor 
1306, input vector deriver 1308, and quantizer 1508a operates sample-by- 
sample in the same manner as described above in connection with codecs 
3000-7000. Nevertheless, the VQ systems and methods are described below 
in terms of performing operations on "vectors" instead of individual samples. 
A "vector" as used herein refers to a group of samples. It is to be understood 
that the VQ systems and methods described below process each of the samples 
in a vector (that is, in a group of samples) one sample at a time. For example, 
a filter filters an input vector in the following manner: a first sample of the 
input vector is applied to an input of the filter; the filter processes the first 
sample of the vector to produce a first sample of an output vector 
corresponding to the first sample of the input vector; and the process repeats 
for each of the next sequential samples of the input vector until there are no 
input vector samples left, whereby the filter sequentially produces each of the 
next samples of the output vector. The last sample of the output vector to be 
produced or output by the filter can remain at the filter output such that it is 
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available for processing immediately or at some later sample time (for 
example, to be combined, or otherwise processed, with a sample associated 
with another vector). A predictor predicts an input vector in much the same 
way as the filter processes (that is, filters) the input vector. Therefore, the 
term "vector" is used herein as a convenience to describe a group of samples 
to be sequentially processed in accordance with the present invention. 

b. Methods 

[0220] A brief overview of a method of operation of system 1300 is now 

provided. In the modified VQ codebook search procedure of the current 
invention implemented using system 1300, we provide one VQ codevector at a 
time fi-om scaled VQ codebook 5028a, perform all predicting, combining, and 
filtering functions of predictor 1306 and input vector deriving logic 1308 to 
calculate the corresponding VQ input vector of the signal u(n), and then 
calculate the energy of the quantization error vector of the signal q(n) using 
error energy calculator 1310. This process is repeated for N times for the 
codevectors in scaled VQ codebook 5028a, with the filter memories in input 
vector deriving logic 1308 reset to their initial values before we repeat the 
process for each new codevector. After all the codevectors have been tried, 
we have calculated N corresponding quantization error energy values of q(n). 
The VQ codevector that minimizes the energy of the quantization error vector 
is the winning codevector and is used as the VQ output vector. The address of 
this winning codevector is the output VQ codebook index CI that is passed to 
the bit multiplexer block 95. 

[0221] The bit multiplexer block 95 in FIG. 7 packs the five sets of indices 

LSPl PPl PPTl Gl and CI into a single bit stream. This bit stream is the 
output of the encoder. It is passed to the communication channel. 

[0222] FIG. 13B is a flow diagram of an example method 1350 of searching 

the N VQ codevectors stored in VQ codebook 1302 for a preferred one of the 
N VQ codevectors to be used in coding a speech or audio signal (method 1350 
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is also referred to as a prediction residual VQ codebook search of an NFC). 
Method 1350 is implemented using system 1300. With reference to FIGs. 13A 
and 13B, at a first step 1352, predictor 1306 predicts a speech signal s(n) to 
derive a residual signal d(n). Predictor 1306 can include a predictor and a 
combiner, such as predictor 5002 and combiner 5004 discussed above in 
connection with FIG. 5, for example. 

[0223] At a next step 1354, input vector deriver 1308 derives N VQ input 

vectors u(n) each based on the residual signal d(n) and a corresponding one of 
the N VQ codevector stored in codebook 1302. Each of the VQ input vectors 
u(n) corresponds to one of N VQ error vectors q(n). Input vector deriver 1308 
and step 1354 are described in further detail below. 

[0224] At a next step 1358, error energy calculator 1310 derives N VQ error 

energy values e(n) each corresponding to one of the N VQ error vectors q(n) 
associated with the N VQ input vectors u(n) of step 1354. Error energy 
calculator 1310 performs a squaring operation, for example, on each of the 
error vectors q(n) to derive the energy values corresponding to the error 
vectors. 

[0225] At a next step 1360, preferred codevector selector 1312 selects a 

preferred one of the N VQ codevectors as a VQ output vector uq(n) 
corresponding to the residual signal d(n), based on the N VQ error energy 
values e(n) derived by error energy calculator 1310. 

[0226] Predictor/filter restorer 1314 initializes and restores (that is, resets) the 

filter states and predictor states of various filters and predictors included in 
system 1300, during method 1350, as will be further described below. 

2. Example Specific Embodiment 
a. System 

[0227] FIG. 13C is a block diagram of a portion of an example codec structure 

or system 1362 used in a prediction residual VQ codebook search of TSNFC 
5000 (discussed above in connection with FIG. 5). System 1362 includes 
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scaled VQ codebook 5028a, and an input vector deriver 1308a (a specific 
embodiment of input vector deriver 1308) configured according to the 
embodiment of TSNFC 5000 of FIG 5. Input vector deriver 1308a includes 
essentially the same feedback structure involved in the quantizer codebook 
search as in FIG. 7, except the shorthand z-transform notations of filter blocks 
in FIG. 5 are used. Input vector deriver 1308a includes an outer or first stage 
NF loop including NF filter 5016, and an inner or second stage NF loop 
including NF filter 5038, as described above in connection with FIG. 5. Also, 
all of the filter blocks and adders (combiners) in input vector deriver 1308a 
operate sample-by-sample in the same manner as described in connection with 
FIG. 5. 

b. Methods 

The method of operation of codec structure 1362 can be considered to 
encompass a single method. Alternatively, the method of operation of codec 
structure 1362 can be considered to include a first method associated with the 
inner NF loop of codec structure 1362 (mentioned above in connection with 
FIG. 13C), and a second method associated with the outer NF loop of the 
codec structure (also mentioned above). The first and second methods 
associated respectively with the inner and outer NF loops of codec structure 
1362 operate concurrently, and in an inter-related manner (that is, together), 
with one another to form the single method. The aforementioned first and 
second methods (that is, the inner and outer NF loop methods, respectively) 
are now described in sequence below. 

FIG. 13D is an example first (inner NF loop) method 1364 
implemented by system 1362 depicted in FIG. 13C. Method 1364 uses the 
inner NF loop of system 1362, as mentioned above. At a first step 1365, 
combiner 5036 combines each of the N VQ input vectors u(n) (mentioned 
above in connection with FIG. 13 A) with the corresponding one of the N VQ 
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codevectors from scaled VQ codebook 5028a to produce the N VQ error 
vectors q(n). 

[0230] At a next step 1366, filter 5038 separately filters at least a portion of 

each of the N VQ error vectors q(n) to produce N noise feedback vectors fq(n) 
each corresponding to one of the N VQ codevectors. Filter 5038 can perform 
either long-term or short-term filtering. Filter 5038 filters each of the error 
vectors q(n) on a sample-by-sample basis (that is, the samples of each error 
vector q(n) are filtered sequentially, sample-by-sample). Filter 5038 filters 
each of the N VQ error vectors q(n) based on an initial filter state of the filter 
corresponding to a previous preferred codevector (the previous preferred 
codevector corresponds to a previous residual signal). Therefore, restorer 
1314 restores fiher 5038 to the initial filter state before the filter filters each of 
the N VQ codevectors. As would be apparent to one of ordinary skill in the 
speech coding art, the initial filter state mentioned above is typically 
estabUshed as a resuh of processing many, that is, one or more, previous 
preferred codevectors. 

[0231] At a next step 1368, combining logic (5006, 5024, and 5026), 

separately combines each of the N noise feedback vectors fq(n) with the 
residual signal d(n) to produce the N VQ input vectors u(n). 

[0232] FIG. 13E is an example second (outer NF loop) method 1370 executed 

concurrently and together with method 1364 by system 1362. Method 1370 
uses the outer NF loop of system 1362, as mentioned above. At a first step 
1372 of method 1370, combiner 5006 separately combines the residual signal 
d(n) with each of the N noise feedback vectors fqs(n) to produce N predictive 
quantizer input vectors v(n). 

[0233] At a next step 1374, predictor 5034 predicts each of the N predictive 

quantizer input vectors v(n) to produce N predictive, predictive quantizer input 
vectors pv(n). Predictor 5034 predicts input vectors v(n) based on an initial 
predictor state of the predictor corresponding to (that is, established by) the 
previous preferred codevector. Therefore, restorer 1314 restores predictor 
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5034 to the initial predictor state before predictor 5034 predicts each of the N 
predictive quantizer input vectors v(n) in step 1374. 

[0234] At a next step 1376, combining logic (e.g., combiners 5024, and 5026) 

separately combines each of the N predictive quantizer input vectors v(n) with 
a corresponding one of the N predicted, predictive quantizer input vectors 
pv(n) to produce the N VQ input vectors u(n). 

[0235] At a next step 1378, a combiner (e.g. combiner 5030) combines each 

of the N predicted, predictive quantizer input vectors pv(n) with corresponding 
ones of the N VQ codevectors, to produce N predictive quantizer output 
vectors vq(n) corresponding to N VQ error vectors qs(n). 

[0236] At a next step 1380, filter 5016 separately filters each of the N VQ 

error vectors qs(n) to produce the N noise feedback vectors fqs(n). Filter 5016 
can perform either long-term or short-term filtering. Filter 5016 filters each of 
the N VQ error vectors qs(n) on a sample-by-sample basis, and based on an 
initial filter state of the filter corresponding to at least the previous preferred 
codevector (see predicting step 1374 above). Therefore, restorer 1314 restores 
filter 5016 to the initial filter state before filter 5016 filters each of the N VQ 
codevectors in step 1380. 

[0237] Alternative embodiments of VQ search systems and corresponding 

methods, including embodiments based on codecs 3000, 4000, and 6000, for 
example, would be apparent to one of ordinary skill in designing speech 
codecs, based on the exemplary VQ search system and methods described 
above. 

[0238] The fundamental ideas behind the modified VQ codebook search 

methods described above are somewhat similar to the ideas in the VQ 
codebook search method of CELP codecs. However, the feedback filter 
structures of input vector deriver 1308 (for example, input vector deriver 
1308a, and so on) are completely different from the structure of a CELP 
codec, and it is not readily obvious to those skilled in the art that such a VQ 
codebook search method can be used to improve the performance of a 
conventional NFC codec or a two-stage NFC codec. 
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[0239] Our simulation results show that this vector quantizer approach indeed 

works, gives better codec performance than a scalar quantizer at the same bit 
rate, and also achieves desirable short-term and long-term noise spectral 
shaping. However, according to another novel feature of the current invention 
described below, this VQ codebook search method can be further improved to 
achieve significantly lower complexity while maintaining mathematical 
equivalence. 

B. Fast VQ Search 

[0240] A computationally more efficient codebook search method according 

to the present mvention is based on the observation that the feedback structure 
in FIG. 13C, for example, can be regarded as a linear system with the VQ 
codevector out of scaled VQ codebook 5028a as its input signal, and the 
quantization error q(n) as its output signal. The output vector of such a linear 
system can be decomposed into two components: a ZERO-INPUT response 
vector qzi(n) and a ZERO-STATE response vector qzs(n). The ZERO-INPUT 
response vector qzi(n) is the output vector of the linear system when its input 
vector is set to zero. The ZERO-STATE response vector qzs(n) is the output 
vector of the linear system when its intemal states (filter memories) are set to 
zero (but the input vector is not set to zero). 

1 . High-Level Embodiment 
a. System 

[0241] FIG. 14A is a block diagram of an example NFC system 1400 for 

efficiently searching through N VQ codevectors, stored in the VQ codebook 
1302 of scaled VQ codebook 5028a, for a preferred one of the N VQ 
codevectors to be used for coding a speech or audio signal. System 1400 
includes scaled VQ codebook 5028a, a ZERO-INPUT response filter structure 
1402, a ZERO-STATE response filter structure 1404, a restorer 1414 similar 
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to restorer 1314 in FIG, 13A, an error energy calculator 1410 similar to error 
energy calculator 1310 in FIG. 13 A, and a preferred codevector selector 1412 
similar to preferred codevector selector 13 12 in FIG. 13 A. 

b. Methods 

[0242] FIG. 14B is an example, computationally efficient, method 1430 of 

searching through N VQ codevectors for a preferred one of the N VQ 
codevectors, using system 1400, In a first step 1432, predictor 1306 predicts 
speech signal s(n) to derive a residual signal d(n). 

[0243] At a next step 1434, ZERO-INPUT response filter structure 1402 

derives ZERO-INPUT response error vector qzi(n) common to each of the N 
VQ codevectors stored in VQ codebook 1302, 

[0244] At a next step 1436, ZERO-STATE response filter structure 1404 

derives N ZERO-STATE response error vectors qzs(n) each based on a 
corresponding one of the N VQ codevectors stored in VQ codebook 1302. 

[0245] At a next step 1438, error energy calculator 1410 derives N VQ error 

energy values each based on the ZERO-INPUT response error vector qzi(n) 
and a corresponding one of the N ZERO-STATE response error vectors 
qzs(n). Preferred codevector selector 1412 selects the preferred one of the N 
VQ codevectors based on the N VQ error energy values derived by error 
energy calculator 1410. 

[0246] The qzi(n) vector derived at step 1434 captures the effects due to (1) 

initial filter memories in ZERO-INPUT response filter structure 1402, and (2) 
the signal vector of d(n). Since the initial filter memories and the signal d(n) 
are both independent of the particular VQ codevector tried, there is only one 
ZERO-INPUT response vector, and it only needs to be calculated once for 
each input speech vector. 

[0247] During the calculation of the ZERO-STATE response vector qzs(n) at 

step 1436, the initial filter memories and d(n) are set to zero. For each VQ 
codebook vector tried, there is a corresponding ZERO-STATE response vector 
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qzs(n). Therefore, for a codebook of codevectors, we need to calculate N 
ZERO-STATE response vectors qzs(n) for each input speech vector, in one 
embodiment of the present invention. In a more computationally efficient 
embodiment, we calculate a set of N ZERO-STATE response vectors qzs(n) 
for a group of input speech vectors, instead of for each of the input speech 
vectors, as is further described below. 

2. Example Specific Embodiments 
a. ZERO-INPUT Response 

[0248] FIG. 14C is a block diagram of an example ZERO-INPUT response 

filter structure 1402a (a specific embodiment of filter structure 1402) used 
during the calculation of the ZERO-INPUT response of q(n) of FIG. 13C. 
During the calculation of the ZERO-INPUT response vector qzi(n), certain 
branches in FIG. 13C can be omitted because the signals going through those 
branches are zero. The resulting structure is depicted in FIG. 14C. ZERO- 
INPUT response filter structure 1402a includes filter 5038 associated with an 
inner NF loop of the filter structure, and filter 5016 associated with an outer 
NF loop of the filter structure. 

[0249] The method of operation of codec structure 1402a can be considered to 

encompass a single method. Alternatively, the method of operation of codec 
structure 1402a can be considered to include a first method associated with the 
inner NF loop of codec structure 1402a, and a second method associated with 
the outer NF loop of the codec structure. The first and second methods 
associated respectively with the inner and outer NF loops of codec structure 
1402a operate concurrently, and together, with one another to form the single 
method. The aforementioned first and second methods (that is, the inner and 
outer NF loop methods, respectively) are now described in sequence below. 

[0250] FIG. 14D is an example first (inner NF loop) method 1450 of deriving 

a ZERO-INPUT response using ZERO-INPUT response filter structure 1402a 
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of FIG. 14C. Method 1450 includes operation of the inner NF loop of system 
1402a. 

[0251] In a first step 1452, an intermediate vector vzi(n) is derived based on 

the residual signal d(n). 

[0252] In a next step 1454, the intermediate vector vzi(n) is predicted (using 

predictor 5034, for example) to produce a predicted intermediate vector 
vqzi(n). Intermediate vector vzi(n) is predicted based on an initial predictor 
state (of predictor 5034, for example) corresponding to a previous preferred 
codevector. As would be apparent to one of ordinary skill in the speech coding 
art, the initial filter state mentioned above is typically established as a result of 
a history of many, that is, one or more, previous preferred codevectors. 

[0253] In a next step 1456, the intermediate vector vzi(n) and the predicted 

intermediate vector vqzi(n) are combined with a noise feedback vector fqzi(n) 
(using combiners 5026 and 5024, for example) to produce the ZERO-INPUT 
response error vector qzi(n). 

[0254] In a next step 1458, the ZERO-INPUT response error vector qzi(n) is 

filtered (using filter 5038, for example) to produce the noise feedback vector 
fqzi(n). Error vector qzi(n) can be either long-term or short-term filtered. 
Also, error vector qzi(n) is fihered based on an initial filter state (of filter 
5038, for example) corresponding to the previous preferred codevector (see 
predicting step 1454 above). 

[0255] FIG. 14E is an example second (outer NF loop) method 1470 of 

deriving a ZERO-INPUT response, executed concurrently with method 1450, 
using ZERO-INPUT response filter structure 1402a. Method 1470 includes 
operation of the outer NF loop of system 1402a. Method 1470 shares some 
method steps with method 1450, described above. 

[0256] In a first step 1472, the residual signal d(n) is combined with a noise 

feedback signal fqszi(n) (using combiner 5006, for example) to produce an 
intermediate vector vzi(n). 
[0257] At a next step 1474, the intermediate vector vzi(n) is predicted to 

produce a predicted intermediate vector vqzi(n). 
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[0258] At a next step 1476, the intermediate vector vzi(n) is combined with 

the predicted intermediate vector vqzi(n) (using combiner 5014, for example) 
to produce an error vector qszi(n), 

[0259] At a next step 1478, the error vector qszi(n) is filtered (using filter 

5016, for example) to produce the noise feedback vector fqszi(n). Error vector 
qszi(n) can be either long-term or short-term filtered. Also, error vector 
qszi(n) is filtered based on an initial filter state (of filter 5038, for example) 
corresponding to the previous preferred codevector (see predicting step 1454 
above). 

b. ZERO-STATE Response 

(1) ZERO-STATE Response - First 
Embodiment 

[0260] FIG. 15 A is a block diagram of an example ZERO-STATE response 

filter structure 1404a (a specific embodiment of filter structure 1404) used 
during the calculation of the ZERO-STATE response of q(n) in FIG. 13C. 

[0261] If we choose the vector dimension to be smaller than the minimum 

pitch period minus one, or < MINPP - 1, which is true in our preferred 
embodiment, then with zero initial memory, the two long-term filters 5038 and 
5034 in FIG. 13A have no effect on the calculation of the ZERO-STATE 
response vector. Therefore, they can be omitted. The resulting structure 
during ZERO-STATE response calculation is depicted in FIG. 15 A. 

[0262] FIG. 15B is a flowchart of an example method 1520 of deriving a 

ZERO-STATE response using filter structure 1404a depicted in FIG. 15 A, In 
a first step 1522, an error vector qszs(n) associated with each of the N VQ 
codevectors stored in scaled VQ codebook 5028a is filtered (using filter 5016, 
for example) to produce a ZERO-STATE input vector vzs(n) corresponding to 
each of the N VQ codevectors. Each of the error vectors qszs(n) is filtered 
based on an initially zeroed filter state (of filter 5016, for example). 
Therefore, the filter state is zeroed (using restorer 1414, for example) to 
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produce the initially zeroed filter state before each error vector qszs(n) is 
filtered. 

[0263] In a next step 1524, each ZERO-STATE input vector vzs(n) produced 

in filtering step 1522 is separately combined with the corresponding one of the 
N VQ codevectors (using combiner 5036, for example), to produce the N 
ZERO-STATE response error vectors qzs(n). 

(2) ZERO-STATE Response -Second 
Embodiment 

[0264] Note that in FIG. 15 A, qszs(n) is equal to qzs(n). Hence, we can 

simply use qszs(n) as the output of the linear system during the calculation of 
the ZERO-STATE response vector. This allows us to simplify FIG. 15A 
further into a simplified structure 1404b in FIG. 16A, which is no more than 
just scaling the VQ codevector by the negative gain -g(n)^ and then passing 
the result through a feedback filter structure with a transfer function of H(z) = 
- Fs(z)li. Therefore, FIG, 16A is a block diagram of filter structure 1404b 
according to a simplified embodiment of ZERO-STATE response filter 
structure 1404, Filter structure 1404b is equivalent to filter structure 1404a of 
FIG. 15 A. 

[0265] If we start with a scaled codebook (use g(n) to scale the codebook) as 

mentioned in the description of block 30 in an earlier section, and pass each 
scaled codevector through the filter H(z) with zero initial memory, then, 
subtracting the corresponding output vector from the ZERO-INPUT response 
vector of qzi(n) gives us the quantization error vector of q(n) for that particular 
VQ codevector. 

[0266] FIG. 16B is a flowchart of an example method 1620 of deriving a 

ZERO-STATE response using filter structure 1404b of FIG. 16 A. In a first 
step 1622, each of N VQ codevectors is combined with a corresponding one of 
N filtered, ZERO-STATE response error vectors vzs(n) to produce the N 
ZERO-STATE response error vectors qzs(n). 
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At a next step 1624, each of the N ZERO-STATE response error 
vectors qzs(n) is separately filtered to produce the N filtered, ZERO-STATE 
response error vectors vzs(n). Each of the error vectors qzs(n) is filtered based 
on an initially zeroed filter state. Therefore, the filter state is zeroed to 
produce the initially zeroed filter state before each error vector qzs(n) is 
filtered. The following enumerated steps represent an example of processing 
one VQ codevector CV(n) including four samples CV(n)o 3 sample-by-sample 
according to steps 1622 and 1624 using filter structure 1404b, to produce a 
corresponding ZERO-STATE error vector qzs(n) including four samples 
qzs(n)o 3: 

L combiner 5030 combines first codevector sample CV(n)o of 
codevector CV(n) with an initial zero state feedback sample vzs(n)i from filter 
5034, to produce first error sample qzs(n)o of error vector qzs(n) (which 
corresponds to first codevector sample CV(n)o) (part of step 1622); 

2. filter 5034 filters first error sample qzs(n)o to produce a first 
feedback sample vzs(n)o of a feedback vector vzs(n) (part of step 1624); 

3. combiner 5030 combines feedback sample vzs(n)o with second 
codevector sample CV(n)i, to produce second error sample qzs(n)i (part of 
step 1622); 

4. filter 5034 filters second error sample qzs(n)i to produce a second 
feedback sample vzs(n)iof feedback vector vzs(n) (part of step 1624); 

5. combiner 5030 combines feedback sample vzs(n)] with third 
codevector sample CV(n)2, to produce third error sample qzs(n)2 (part of step 
1622); 

6. filter 5034 filters third error sample qzs(n)2 to produce a third 
feedback sample vzs(n)2 (part of step 1624); and 

7. combiner 5030 combines feedback sample vzs(n)2 with fourth (and 
last) codevector sample CV(n)3, to produce fourth error sample qzs(n)3, 
whereby the four samples of vector qzs(n) are produced based on the four 
samples of VQ codevector CV(n) (part of step 1622). Steps 1-7 described 
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above are repeated for each of the N VQ codevectors in accordance with 
method 1620, to produce the N error vectors qzs(n). 
[0268] This second approach (corresponding to FIGs. 16A and 16B) is 

computationally more efficient than the first (and more straightforward) 
approach (corresponding to FIGs. 15A and 15B), For the first approach, the 
short-term noise feedback filter takes KM multiply-add operations for each 
VQ codevector. For the second approach, only K{K - l)/2 multiply-add 
operations are needed ifK<M, In our preferred embodiment, M= 8, and K = 
4, so the first approach takes 32 multiply-adds per codevector for the short- 
term filter, wdiile the second approach takes only 6 multiply-adds per 
codevector. Even with all other calculations included, the second codebook 
search approach still gives a very significant reduction in the codebook search 
complexity. Note that the second approach is mathematically equivalent to the 
first approach, so both approaches should give an identical codebook search 
result. 

[0269] Again, the ideas behind this second codebook search approach are 

somewhat similar to the ideas in the codebook search of CELP codecs. 
However, the actual computational procedures and the codec structure used 
are quite different, and it is not readily obvious to those skilled in the art how 
the ideas can be used correctly in the framework of two-stage noise feedback 
coding. 

[0270] Using a sign-shape structured VQ codebook can fiirther reduce the 

codebook search complexity. Rather than using a 5-bit codebook with 2^^ 
independent codevectors, we can use a sign bit plus a (5 - l)-bit shape 
codebook with 2^^'^ independent codevectors. For each codevector in the {B - 
l)-bit shape codebook, the negated version of it, or its mirror image with 
respect to the origin, is also a legitimate codevector in the equivalent 5-bit 
sign-shape structured codebook. Compared with the 5-bit codebook with 2^ 
independent codevectors, the overall bit rate is the same, and the codec 
performance should be similar. Yet, with half the number of codevectors, this 
arrangement cut the number of filtering operations through the filter H(z) = 
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1/[1 - Fs(z)] by half, since we can simply negate a computed ZERO-STATE 
response vector corresponding to a shape codevector in order to get the 
ZERO-STATE response vector corresponding to the mirror image of that 
shape codevector. Thus, further complexity reduction is achieved. 

[0271] In the preferred embodiment of the 16 kb/s narrowband codec, we use 

1 sign bit with a 4-bit shape codebook. With a vector dimension of 4, this 
gives a residual encoding bit rate of (l+4)/4 = 1.25 bits/sample, or 50 
bits/frame (1 frame = 40 samples = 5 ms). The side information encoding 
rates are 14 bits/frame for LSPI, 7 bits/frame for PPI, 5 bits/frame for PPTI, 
and 4 bits/frame for GL That gives a total of 30 bits/frame for all side 
information. Thus, for the entire codec, the encoding rate is 80 bits/frame, or 
16 kb/s. Such a 16 kb/s codec with a 5 ms frame size and no look ahead gives 
output speech quality comparable to that of G.728 and G.729E. 

[0272] For the 32 kb/s wideband codec, we use 1 sign bit with a 5 -bit shape 

codebook, again with a vector dimension of 4, This gives a residual encoding 

rate of (l+5)/4 = 1.5 bits/sample = 120 bits/frame (1 frame = 80 samples = 5 

ms). The side information bit rates are 17 bits/frame for LSPI, 8 bits/frame for 

PPI, 5 bits/frame for PPTI, and 10 bits/frame for G/, giving a total of 40 

bits/frame for all side information. Thus, the overall bit rate is 160 bits/frame, 

or 32 kb/s. Such a 32 kb/s codec with a 5 ms frame size and no look ahead 

gives essentially transparent quality for speech signals. 

(3) Further Reduction in Computational 
Complexity 

[0273] The speech signal used in the vector quantization embodiments 

described above can comprise a sequence of speech vectors each including a 
plurality of speech samples. As described in detail above, for example, in 
connection with FIG. 7, the various filters and predictors in the codec of the 
present invention respectively filter and predict various signals to encode 
speech signal s(n) based on filter and predictor (or prediction) parameters (also 
referred to in the art as filter and predictor taps, respectively). The codec of 
the present invention includes logic to periodically derive, that is, update, the 



-68- 



filter and predictor parameters, and also the gain g(n) used to scale the VQ 
codebook entries, based on the speech signal, once every M speech vectors, 
where M is greater than one. Codec embodiments for periodically deriving 
filter, prediction, and gain scaling parameters were described above in 
connection with FIG. 7. 

[0274] The present invention takes advantage of such periodic updating of the 

aforementioned parameters to further reduce the computational complexity 
associated with calculating the N ZERO-STATE response error vectors qzs(n), 
described above. With reference again to FIG. 16A, the N ZERO-STATE 
response error vectors qzs(n) derived using filter structure 1404b depend on 
only the N VQ codevectors, the gain value g(n), and the filter parameters 
(taps) applied to filter 5034. Since the gain value g(n) and filter taps applied 
to filter 5034 are constant over M speech vectors, that is, between updates, and 
since the N VQ codevectors are also constant, the N ZERO-STATE response 
error vectors qzs(n) corresponding to the N VQ codevectors are 
correspondingly constant over the M speech vectors. Therefore, the N ZERO- 
STATE response error vectors qzs(n) need only be derived when the gain g(n) 
and/or filter parameters for filter 5034 are updated once every M speech 
vectors, thereby reducing the overall computational complexity associated 
with searching the VQ codebook for a preferred one of the VQ codevectors, 

[0275] FIG. 17 is a flowchart of an example method 1700 of further reducing 

the computational complexity associated with searching the VQ codebook for 
a preferred one of the VQ codevectors, in accordance with the above 
description. In a first step 1702, a speech signal is received. The speech 
signal comprises a sequence of speech vectors, each of the speech vectors 
including a plurality of speech samples. 

[0276] At a next step 1704, a gain value is derived based on the speech signal 

once every M speech vectors, where M is an integer greater than 1 . 

[0277] At a next step 1706, filter parameters are derived/updated based on the 

speech signal once every T speech vectors, where T is an integer greater than 
one, and where T may, but does not necessarily, equal M. 



-69- 



[0278] At a next step 1708, the N ZERO-STATE response error vectors q2s(n) 

are derived once every T and/or M speech vectors (i.e., when the filter 
parameters and/or gain values are updated, respectively), whereby a same set 
of N ZERO-STATE response error vectors qzs(n) is used in selecting a 
plurality of preferred codevectors corresponding to a plurality of speech 
vectors. 

[0279] Alternative embodiments of VQ search systems and corresponding 

methods, including embodiments based on codecs 3000, 4000, and 6000, for 
example, would be apparent to one of ordinary skill in designing speech 
codecs, based on the exemplary VQ search system and methods described 
above. 



C. Further Fast VQ Search Embodiments 

[0280] The present invention provides first and second additional efficient VQ 

search methods, which can be used independently or jointly. The first method 
(described below in Section IX.C.l.) provides an efficient VQ search method 
for a general VQ codebook, that is, no particular structure of the VQ codebook 
is assumed. The second method (described below in Section IX.C.2.) provides 
an efficient method for the excitation quantization in the case where a signed 
VQ codebook is used for the excitation. 

[0281] The first method reduces the complexity of the excitation VQ in NFC 

by reorganizing the calculation of the energy of the error vector for each 
candidate excitation vector, also referred to as a codebook vector. The energy 
of the error vector is the cost function that is minimized dwing the search of 
the excitation codebook. The reorganization is obtained by: 

1. Expanding the Mean Squared Error (MSE) term of the error 

vector; 

2. Excluding the energy term that is invariant to the candidate 
excitation vector; and 
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3. Pre-computing the energy terms of the ZERO-STATE response 
of the candidate excitation vectors that are invariant to the sub-vectors of the 
subframe. 

[0282] The second method represents an efficient way of searching the 

excitation codebook in the case where a signed codebook is used. The second 
method is obtained by reorganizing the calculation of the energy of the error 
vector in such a way that only half of the total number of codevectors is 
searched. 

[0283] The combination of the first and second methods also provides an 

efficient search. However, there may be circumstances where the first and 
second methods are used separately. For example, if a signed codebook is not 
used, then the second invention does not apply, but the first invention may be 
applicable. 

[0284] For mathematical convenience, the nomenclature used in Sections 

IX.C.L and 2. below to refer to certain quantities differs fi-om the 
nomenclature used in Section IX.B. above to refer the same or similar 
quantities. The following key serves as a guide to map the nomenclature used 
in Section IX.B. above to that used in the following sections. 

[0285] In Section IX.B. above, quantization energy e(n) refers to a 

quantization energy derivable from an error vector q(n), where n is a 
time/sample position descriptor. Quantization energy e(n) and error vector 
q(n) are both associated with a VQ codevector in a VQ codebook. 

[0286] Similarly, in Sections IX.C.l. and 2. below, quantization energy En 

refers to a quantization energy derivable from an error vector qn(k), where k 
refers to the k^^ sample of the error vector, and where k 1 . . K (that is, K is 
the total number of samples in the error vector). K is referred to as the error 
vector dimension. Quantization energy En and error vector qn(k) are each 
associated with an n*^ VQ codevector of N VQ codevectors (where n = 1 N). 

[0287] In Section IX.B. above, the ZERO-INPUT response error vector is 

denoted qzi(n), where n is the time index. In Sections IX.C.l. and 2. below, 
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the ZERO-INPUT response error vector is denoted qzi(k), where k refers to the 
k'" sample of the ZERO-INPUT response error vector. 
[0288] In Section IX.B. above, the ZERO-STATE response error vector is 

denoted qzs(n), where n is the time index. In Sections IX.C.l. and 2. below, 
the ZERO-STATE response error vector is denoted qzs,n(k), where n denotes 

th 

the VQ codevector of the N VQ codevectors, and k refers to the k sample 
of the ZERO-STATE response error vector, 
[0289] Also, Section IX.B. above, refers to "frames," for example 5 ms 

frames, each corresponding to a plurality of speech vectors. Also, multiple 
bits of side information and VQ codevector indices are transmitted by the 
coder in each of the frames. In the Sections below, the term "subframe" is 
taken to be synonymous with 'Trame" as used in the Sections above. 
Correspondingly, the term "sub-vectors" refers to vectors within a subframe. 

1. Fast VQ Search of General (Unsigned) Excitation 
Codebook in NFC system 

a. Straightforward method 

[0290] The energy, of the error vector, q,,{k), of the n'' codevector is 

given by 

and the optimal codevector, n,,^,, , is given by the codevector, n , that minimizes 



£„,i.e. 



where TV is the number of codevectors. 
[0291] As discussed above in Section IX.B., the error vector, q,X^) , of the 

codevector can be calculated as the superposition of the ZERO-INPUT 
response, q,,{k), and the ZERO-STATE response, ^,,,,(^)^ of Ae 
codevector, i.e. 



Jh 



-72- 



(3) 



[0292] 



Utilizing this expression, the energy of the error vector, is 



expressed as 




[0293] 



For an NFC system where the dimension of the excitation VQ, K , is 



less than the master vector size, Kj^ (where Kj^ can be thought of as a frame 
size or dimension) there will be multiple excitation vectors to quantize per 
master vector (or frame). The master vector size, a:^ , is typically the 
maximum number of samples for which other parameters of the NFC system 
remain constant. If the relation between the dimension of the VQ, K , and 
master vector size, Kj^ , is defined as 



L VQs would be performed per master vector. According to the analysis and 
assumptions discussed in Section IX.B,2.b.3. above, the ZERO-STATE 
responses of the codevectors are unchanged for the L VQs and need only be 
calculated once (in the case where the gain and/or filter parameters are 
updated once every L VQs). The calculation of all error vector energies for all 
codevectors, for all VQs in a master vector will then require 
Ci=LN'K'2 (6) 
floating point operations, disregarding the calculation of the ZERO-INPUT 
and ZERO-STATE responses. For the example narrowband and wideband 
NFC systems described in Section IX.B. above, the parameters of Eq. 6 
areL = io, N = 32, a; = 4, andL = io, N = 64, ^ = 4 , respectively. Consequently, 
according to Eq. 6 the number of floating point operations required would be 
Cj = 2560 and Cj = 5 120 , respectively. The example numbers are summarized 
in Table 1 below in comparison with the equivalent numbers for the present 
invention. 
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b. Fast VQ Search of General Excitation Codebook 
Using Correlation Technique 

[0294] In the present first invention the energy of the error vector of a given 

codevector is expanded into 

K 



where 

lis::- 



k=^ (J) 

k=\ k=l 



^..-Z^.w^ (8) 

K 

^..,«=E^-^"W,and (9) 



[0295] In Eq. 7 the energy of the error vector is expanded into the energy of 

the ZERO-INPUT response, Eq. 8, the energy of the ZERO-STATE response, 
Eq. 9, and two times the cross-correlation between the ZERO-INPUT response 
and the ZERO-STATE response, Eq. 10. 

[0296] The minimization of the energy of the error vector as a function of the 

codevector is independent of the energy of the ZERO-INPUT response since 
the ZERO-INPUT response is independent of the codevector. Consequently, 
the energy of the ZERO-INPUT response can be omitted when searching the 
excitation codebook. Furthermore, since the N energies of the ZERO-STATE 
responses of the codevectors are unchanged for theL VQs, the energies need 
only be calculated once. 

[0297] Consequently, the VQ operation can be expressed as: 
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opi 




arg min {e^^,^ ^„ + r{^^, , q^,^„ )} 



[0298] 



In Eq. 1 1 only the cross-correlation term would be calculated inside the 



search loop. The zero-response energies,^^ „, « = !,... A^, would be pre- 
computed prior to theL VQs as explained above. Using Eq. 9 through Eq. 1 1 
to perform the L VQs would require 



floating point operations for the calculations needed to select codevectors for 
all L VQs in a master vector, disregarding the calculation of the ZERO-INPUT 
and ZERO-STATE responses. For the example narrowband and wideband 
NFC systems mentioned above this would result inC2,«^ = i728 and C2„„/, = 3456 
floating point operations, respectively. The example numbers are summarized 
in Table 1 . 

[0299] For narrowband and wideband NFC systems, generally, a significant 

reduction in the number of floating point operations is obtained with the 
invention. However, it should be noted that the actual reduction depends on 
the parameters of the NFC system. In particular, it is obvious that if the VQ 
dimension is equal to the dimension of the master vector, i.e. K = Kj^ <^ L = \, 
there is only one VQ per master vector, and effectively the reuse of the 
energies of the ZERO-STATE responses is not an issue. 
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2. Fast VQ Search of Signed Excitation Codebook in NFC 
System 

[0300] A second invention devises a way to reduce complexity in the case a 

signed codebook is used for the excitation VQ. In a signed codebook the code 
vectors are related in pairs, where the two code vectors in a pair only differ by 
the sign of the vector elements, i.e. a first and second code vector in a pair, Ci 
and C2, respectively, are related by 

qW=:-C2W,for /: = l,2,...,/c, (13) 
where K is the dimension of the vectors. Consequently, for a codebook of N 
codevectors Nil linear independent codevectors exist. The remaining Nil 
codevectors are given by negating the A^/ 2 linear independent codevectors as 
in Eq. 13. Typically, if 5 bits are used to represent the N codevectors, i.e. 
5 = log2(^^^), then the sign is represented by 1 bit, and the linear independent 
codevectors by B- \ bits. 

[0301] It is only necessary to store the N II linear independent codevectors as 

the remaining N 12 codevectors are easily generated by simple negation. 
Furthermore, the ZERO-STATE responses of the remaining Nil codevectors 
are given by a simple negation of the ZERO-STATE responses of the N ll 
linear independent codevectors. Consequently, the complexity of generating 
the ZERO-STATE responses is reduced with the use of a signed codebook. 

[0302] The present second invention further reduces the complexity of 

searching a signed codebook by manipulating the minimization operation. 

a. Straightforward Method 

[0303] By calculating the energy of the error vectors according to the 

straightforward method, see Eq. 2 and Eq. 4, the search is given by 
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arg mm i 




arg mm < 

{n,s)e{l. ,N/2}x{+\-\} 



arg mm < 

(w,s)e{l,. ,A^/2}x{+l -1} 




K 



(14) 



where s is the sign and «e{i,...,7V/2} represents the N/2 linear independent 
codevectors. In practice both of the two signs are checked for every of the 
N/2 Unear independent codevectors without applying the multiplication with 
the sign, which would unnecessarily increase the complexity. The number of 
floating point operations needed to calculate the energy of the error vector for 
all of the combined N codevectors for all of the L VQs, would remain as 
specified by Eq. 6, 

C^=L'N-K-2 (15) 

Note that this figure excludes the calculations of the ZERO-INPUT and 
ZERO-STATE responses. Nevertheless, once the ZERO-INPUT and ZERO- 
STATE responses are calculated the complexity of the remaining operations 
remains imchanged. The number of floating point operations for the 
narrowband and wideband example is, as above, c,„^ =2560 and c^ ,,^ = 5120, 
respectively. 



Fast VQ Search of Signed Excitation Codebook 
Using Correlation Technique 



[0304] Similar to the first invention the term of the energy of the error vector 

is expanded, except for the fiirther incorporation of the property of a signed 
codebook. 
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(16) 




where j is the sign and n&{\,...,N/2} represents the N/2 linear independent 
codevectors. In Eq. 16 the energy of the error vector is examined for a pair of 
codevectors in the signed codebook. According to Eq. 16 the energy of the 
error vector can be expanded into the energy of the ZERO-INPUT response, 
Eq. 8, the energy of the ZERO-STATE response, Eq. 9, and two times the 
cross-correlation between the ZERO-INPUT response and the ZERO-STATE 
response, Eq. 10. The sign of the cross-correlation term depends on the sign 
of the codevector. The minimization of the energy of the error vector as a 
function of the codevector is independent of the energy of the ZERO-INPUT 
response since the ZERO-INPUT response is independent of the codevector. 
Consequently, the energy of the ZERO-INPUT response can be omitted when 
searching the excitation codebook, and the search is given by 




arg mm j 

(«,s)6{l. .,A^/2}x{+l -1} 





arg mm < 

(h,s)€{i, ,^/2}x{+1-!} 



(17) 
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From Eq. 17 it is evident that if a pair of codevectors, i.e. 5 = +1, are 
considered jointly, the two minimization terms, £„,,=+i and £„,,=_i are given 
by 

£»,.s=.i = +^(^z/,^r,s,J , and (18) 

respectively. Evidently, if the cross-correlation term R{q::,,q,,,n) is less than 
zero, the codevector w^ith the positive sign will provide a smaller minimization 
term and only ,^+1 needs to be computed and checked. Otherwise, if the 
cross-correlation term R{g,„q,,^,,) is greater than zero, the codevector with the 
negative sign will provide a smaller minimization term and only needs 
to be computed and checked. If the cross-correlation term is zero, either of the 
two can be checked since the two signs will provide identical minimization 
terms. Consequently, the search can be specified as 

else K., = - R{q.ngz.A' = 

where the less-than sign is interchangeable with a less-than-or-equal sign. The 
number of floating point operations needed to calculate the energy of the error 
vector for all of the combined N codevectors for all of the L VQs according 
to the search specified by Eq. 20 is 

C3 {2 K + \) (21) 

= L'N-{k + ]/2) 

Again, disregarding the calculation of the ZERO-INPUT and ZERO-STATE 
responses. The number of floating point operations for the example 
narrowband and wideband NFC systems is q^, = ]440 and C3,,;,=2880, 
respectively. The example numbers are summarized in Table 1 . 

This method would also apply to a signed sub-codebook within a 
codebook, i.e. a subset of the code vectors of the codebook make up a signed 
codebook. It is then possible to apply the invention to the signed sub- 
codebook. 
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3 , Combination of Efficient Search Methods 

[0307] If the number of VQs per master vector, L , is greater than one, and a 

signed codebook (or sub-codebook) is used it is advantageous to combine the 
two methods above. In this case the energies of zero -responses, 
„,n = \,,..N/2, in Eq. 20 remains unchanged for the lVQs and are pre- 
calculated according to the first method. The number of floating point 
operations needed to calculate the energy of the error vector for all of the 
combined codevectors for all of the L VQs is 

C, =N/2-K + L^N/2iK + \) (22) 
= \/2'{NK + L'N{K + \)y 

For the example narrowband and wideband NFC systems the number of 
floating point operations c,,,^S64 and = 1728 , respectively. The example 
numbers are summarized in Table 1 . 



4. Method Flow charts 

[0308] The methods of the present invention, described in Sections IX.C.l. 

and 2., are used in an NFC system to quantize a prediction residual signal. 
More generally, the methods are used in an NFC system to quantize a residual 
signal. That is, the residual signal is not limited to a prediction residual signal, 
and thus, the residual signal may include a signal other than a prediction 
residual signal. The prediction residual signal (and more generally, the 
residual signal) includes a series of successive residual signal vectors. Each 
residual signal vector needs to be quantized. Therefore, the methods of the 
present invention search for and select a preferred one of a plurality of 
candidate codevectors corresponding to each residual vector. Each preferred 
codevector represents the excitation VQ of the corresponding residual signal 
vector. 

[0309] FIG. 18 is a flow chart of an example method 1800 of quantizing 

multiple vectors, for example, residual signal vectors, in a master vector (or 
frame), according to the correlation techniques described in Sections IX.C.l 
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and IX.C.2. Method 1800 is implemented in an NFC system. For example, 
method 1800 is useable with the exemplary NFC systems, structures, and 
methods described in connection with FIGs. 1-17, to the extent excitation VQ 
is used in these systems, structures, and methods. Each of these NFC systems 
includes at least one noise feedback loop/filter to shape coding noise. 
[0310] In one arrangement, method 1800 uses an unsigned or general VQ 

codebook including N unsigned candidate codevectors (see Section IX.Cl.b, 
above). 

[0311] In another arrangement, method 1800 uses a signed VQ codebook 

including N signed candidate codevectors (see Section IX.Cl.b above). For 
example, the signed VQ codebook represents a product of : 

[0312] a shape code, Cshape = {ci, C2, 03, . . .CN/2}, including N/2 shape 

codevectors Cn, and 

[0313] a sign code, Csign = {+U -1 }, including a pair of oppositely-signed sign 

values +1 and -1, such that a positive codevector and a negative codevector 
(referred to as the signed codevectors) associated with each shape codevector 
Cn each represent a product of the shape codevector and a corresponding one 
of the sign values. Thus, the N/2 shape codevectors, when combined with the 
sign code, correspond to N signed codevectors. That is, first and second 
oppositely signed codevectors are associated with each on the shape 
codevectors. 

[0314] Method 1800 assumes there are L vectors in the master vector (or 

frame) and that the ZERO-STATE responses of the N codevectors (which may 
be signed or unsigned, as mentioned above) are invariant over the L vectors, 
because gain and/or filter parameters in the NFC system are updated only once 
every L vectors. 

[0315] At a first step 1805, N ZERO-STATE responses, each corresponding 

to a respective one of the N codebook vectors, are calculated. The N ZERO- 
STATE responses may be calculated using the NFC filter structures of FIGs. 
15 A and 16A, and associated methods, for example. 
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[0316] At a next step 1 810, N ZERO-STATE energies, corresponding to the N 

ZERO-STATE responses of step 1805, are calculated. 

[0317] At a next step 1815, an initial one of the L vectors in the frame to be 

quantized is identified, 

[0318] Next, a loop including steps 1820, 1825, 1830, 1835 and 1840 is 

repeated for each of the vectors to be quantized in the frame. Each iteration of 
the loop produces an excitation VQ corresponding to a successive one of the 
vectors in the frame, beginning with the initial vector. At first step 1820 of the 
loop, a ZERO-INPUT response corresponding to the given (that is, identified) 
vector is calculated. For example, in the first iteration of the loop, a ZERO- 
INPUT response corresponding to the first vector in the frame is calculated. 
The ZERO-INPUT response may be calculated using the NFC filter structure 
described above in connection with FIG, 14C, and methods associated 
therewith, for example. 

[0319] At a next step 1825, a best or preferred codevector is selected from 

among the N codevectors based on minimization terms. The minimization 
terms are derived based on the N ZERO-STATE energies from step 1810, and 
cross-correlations between the ZERO-INPUT response from step 1820 and 
ZERO-STATE responses from step 1805. In the arrangement of method 1800 
using unsigned codevectors, step 1825 is govemed by Eq. 11 of Section 
IX.C.l.b. above. In the arrangement of method 1800 using signed 
codevectors, step 1825 is governed by Eq. 20 of Section IX.C.2,b. above. Step 
1825 is described further below in connection with FIGs. 19 and 20. 

[0320] At a next step 1830, filter memories in the NFC system used to 

implement method 1800 are updated using the best or preferred codevector 
selected in step 1825. 

[0321] At a decision step 1835, it is determined whether a last one of the 

vectors in the frame has been quantized. If yes, then the method is done. On 
the other hand, if fiirther vectors in the frame remain to be quantized, flow 
proceeds to a step 1840, and a next one of the vectors to be quantized in the 
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frame is identified. The quantization loop repeats for the next vector, and so 
on, for each of the L vectors in the frame. 

[0322] FIG. 19 is a flowchart of an example method 1900 expanding on step 

1825 of FIG. 18, using a general, or unsigned VQ codebook. In other words, 
method 1900 corresponds to a VQ search of an unsigned VQ codebook, as 
described in Section IX.C.Lb,, above. Method 1900 represents a search of the 
N candidate codevectors in the codebook to select the preferred codevector to 
be used as the excitation quantization in step 1825. At a first step 1905, a first 
one of the N codevectors to be examined/tested is identified. Next, a search 
loop, including steps 1910 through 1945, is repeated for each of the N 
codevectors, beginning with the first codevector identified in step 1905. 

[0323] At initial step 1910 of the loop, one of the ZERO-STATE responses 

calculated in step 1805 is retrieved. The retrieved ZERO-STATE response 
corresponds to the codevector being tested during the current iteration of the 
search loop. For example, the first time through the loop, the ZERO-STATE 
response corresponding to the first codevector is retrieved. 

[0324] At a next step 1915, a cross-correlation between the ZERO-STATE 

response and the ZERO-INPUT response (from step 1 820) is calculated. The 
cross-correlation produces a correlation term (also referred to as a "correlation 
resulf). 

[0325] At a next step 1920, the ZERO-STATE energy, corresponding to the 

ZERO-STATE response of step 1910, is retrieved. 

[0326] At a next step 1925, a minimization term, corresponding to the 

codevector being tested in the current iteration of the search loop, is 
calculated. The minimization term is based on the retrieved ZERO-STATE 
energy, and a cross-correlation between the ZERO-STATE response of the 
codevector being tested and the ZERO-INPUT response. The ZERO-STATE 
energy and the cross-correlation term are combined (for example, the ZERO- 
STATE energy and cross-correlation term are added as in Eq, 11, and as in Eq. 
20 when the cross-correlation term is negative). 
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[0327] At next steps 1930 and 1935, the current minimization term Qnst 

calculated in step 1925) is compared to the minimization terms resuhing from 
previous iterations through the search loop, to identify a current best 
minimization term from among all of the minimization terms calculated thus 
far. The codevector corresponding to this current best minimization term is 
also identified. 

[0328] At a next step 1940, it is determined whether a last one of the N 

codevectors has been tested. If yes, then the method is done because the 

codebook has been searched, and a preferred codevector has been determined. 

However, if no, at step 1945, then a next one of the N codevectors to be tested 

is identified, and the search loop is repeated. 
[0329] Assuming N iterations of the loop in method 1 900 for each vector to be 

quantized, then method 1900 performs the following steps: 

deriving N correlation values using the NFC system (step 1915), each 

of the N correlation values corresponding to a respective one of the N VQ 

codevectors; 

combining each of the N correlation values with a corresponding one 
of N ZERO-STATE energies of the NFC system (step 1925), thereby 
producing N minimization values each corresponding to a respective one of 
the N VQ codevectors; and 

selecting a preferred one of the N VQ codevectors based on the N 
minimization values (steps 1930 and 1935), whereby the preferred VQ 
codevector is usable as an excitation quantization corresponding to a 
prediction residual signal (and more generally, to a residual signal) derived 
fi-om a speech or audio signal. 
[0330] Since the prediction residual signal (more generally, the residual 

signal) includes a series of prediction residual vectors (more generally, a series 
of residual vectors), and method 1900 is repeated for each of the residual 
vectors in accordance with method 1800, overall the method produces an 
excitation quantization corresponding to each of the prediction residual 
vectors (and more generally, to each of the residual vectors). 
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[0331] FIG. 20 is a flow chart of an example method 2000 expanding on step 

1825, using a signed VQ codebook. Therefore, method 2000 quantizes 
vectors according to the techniques described in Section IX.C.2.b. above, and 
thus corresponds to a VQ search of a signed codebook. Method 2000 reduces 
search complexity even in the case where there is only one vector per frame, 
that is, where L = 1. In this case, the ZERO-STATE responses of the signed 
codevectors are calculated for each residual vector to be quantized, rather than 
once every several residual vectors (that is, when L is greater than 1). 

[0332] In a first step 2005, a first shape codevector to be tested (for example, 

codevector ci) in the shape codebook is identified. 

[0333] At a next step 2010, the ZERO-STATE response of the shape 

codevector is retrieved. 

[0334] At a next step 2015, the energy of the ZERO-STATE response of step 

2010 is retrieved. 

[0335] At a next step 2020, a cross-correlation term between the ZERO- 

STATE response of the shape codevector and the ZERO-INPUT response is 
calculated. The sign of the cross-correlation term may be a first value (for 
example, negative) or a second value (for example, positive). 

[0336] At a next step 2025, the sign value of the cross-correlation term is 

determined. For example, it is determined whether the cross-correlation term 
is positive. If yes (the cross-correlation term is positive), then at step 2030, a 
minimization term is calculated as the energy of the ZERO-STATE response 
minus the cross-correlation term. In block 2030, the phrase "sign is negative" 
indicates block 2030 corresponds to the negative codevector. Thus, arriving at 
block 2030 indicates the negative codevector is the prefen^ed one of the 
negative and positive codevectors corresponding to the current shape 
codevector (see Eq. 20 of Section IX.C.2.b- above). 

[0337] On the other hand, if the cross-correlation term is negative, then at step 

2035, the minimization term is calculated as the energy of the ZERO-STATE 
response plus the cross-correlation term. In block 2035, the phrase "sign is 
positive" indicates block 2035 corresponds to the positive codevector. Thus, 
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arriving at block 2035 indicates the positive code vector is the preferred one of 
the negative and positive codevectors corresponding to the current shape 
codevector. 

[0338] Next, steps 2040 and 2045 determine the best current minimization 

term among all of the minimization terms calculated so far, and also, identify 
the signed codevector associated with the best current minimization term. 

[0339] At a next step 2050, it is determined whether the last codevector in the 

shape codebook has been tested. If yes, then the search is completed and the 
preferred shape codevector and its sign have been determined. If no, then at 
step 2055, the next shape codevector to be tested in the shape codebook is 
identified. 

[0340] In an alternative arrangement of method 2000, it is not assumed that 

the ZERO-STATE responses and their corresponding energies have been 
precalculated. In this alternative arrangement, the ZERO-STATE response 
and ZERO-STATE energy corresponding to each shape codevector is 
calculated within each iteration of the search loop, using additional method 
steps. 

[0341] Assuming N iterations of the loop in method 2000, method 2000 

performs the following steps for each vector to be quantized: 
for each shape codevector 

(a) deriving a correlation term corresponding to the shape 
codevector where at least one filter structure of the NFC system has been used 
to generate the signals for the correlation (step 2020); 

(b) deriving a first minimization value corresponding to the 
positive codevector associated with the shape codevector when a sign of the 
correlation term is a first value (steps 2025 and 2030); and 

(c) deriving a second minimization value con^esponding to 
the negative codevector associated with the shape codevector when a sign of 
the correlation term is a second value (steps 2025 and 2035); and 
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selecting a preferred codevector from among the positive and negative 
codevectors corresponding to minimization values derived in steps (b) and (c) 
based on the minimization values (steps 2045 and 2040). 
[0342] Example methods 1900 and 2000 each derive a minimization term 

corresponding to a codevector in each iteration of their respective search 
loops. In alternative arrangements of Methods 1900 and 2000, all of the 
minimization terms may be calculated in a single step, followed by a single 
step search through all of these minimization terms to select the preferred 
minimization term, and corresponding codevector. 

5. Comparison of Search Method Complexities 

[0343] This section provides a summary and comparison of the number of 

floating point operations that is required to perform the L VQs in a master 
vector for the different methods. The comparison assumes that the same 
techniques are used to obtain the ZERO-INPUT response and ZERO-STATE 
responses for the different methods, and thus, that the complexity associated 
herewith is identical for the different methods. Consequently, this complexity 
is omitted from the estimated number of floating point operations. The 
different methods are mathematically equivalent, i.e., all are equivalent to an 
exhaustive search of the codevectors. The comparison is provided in Table 1, 
which lists the expression for the number of floating point operations as well 
as the number of floating point operations for the example narrowband and 
wideband NFC systems. In the table the first and second inventions are 
labeled "Pre-computation of energies of ZERO-STATE responses" and 
"signed codebook search", respectively. 
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Table 1 : Comparison of the number of floating point operations for the 

different methods. 



Method 


Application 


Expression 


Example 
narrowband 

L = 10, 

N = 32, a: = 4 


Example 
wideband 

N = 64, K = 4 


S trai ghtf orwar d 
Method 


Any codebook 


C, -=L'N'K'2 


2560 


5120 


Pre-Computation of 
Energies of Zero- 
State Responses 


Any codebook 


C2 =N'K-hL-N'{K-h\) 


1728 


3456 


Signed Codebook 
Search 


Signed 
codebook 


C3 =L N (K + yi) 


1440 


2880 


Pre-Computation of 
Energies of Zero- 
State Responses 
Signed Codebook 
Search 


Signed 
codebook 


C4 =y2-{N-K-^L'N'{K + \)) 


864 


1728 



[0344] It should be noted that the sign of the cross-correlation term in Eq. 7, 

11, 16, 17, 18, 19, and 20 is opposite in some NFC systems due to alternate 
sign definitions of the signals. It is to be understood that this does not affect 
the present invention fundamentally, but will simply result in proper sign 
changes in the equations and methods of the invention. 



X. Decoder Operations 

[0345] The decoder in FIG. 8 is very similar to the decoder of other predictive 

codecs such as CELP and MPLPC. The operations of the decoder are well- 
known prior art. 

[0346] Refer to FIG. 8. The bit de-multiplexer block 100 unpacks the input 

bit stream into the five sets of indices LSPl PPl PPTl Gl and CL The long- 
term predictive parameter decoder block 110 decodes the pitch period as pp = 
17 + PPL It also uses PPTI as the address to retrieve the corresponding 
codevector from the 9-dimensional pitch tap codebook and multiplies the first 
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three elements of the code vector by 0.5 to get the three pitch predictor 
coefficients { Z>^*i,&^*2 5^7*3 }• The decoded pitch period and pitch predictor taps 
are passed to the long-term predictor block 140. 

[0347] The short-term predictive parameter decoder block 120 decodes LSPI 

to get the quantized version of the vector of LSP inter-frame MA prediction 
residual. Then, it performs the same operations as in the right half of the 
structure in FIG. 10 to reconstruct the quantized LSP vector, as is well known 
in the art. Next, it performs the same operations as in blocks 17 and 18 to get 
the set of short-term predictor coefficients { a, }, which is passed to the short- 
term predictor block 160. 

[0348] The prediction residual quantizer decoder block 130 decodes the gain 

index GI to get the quantized version of the log-gain prediction residual. 
Then, it performs the same operations as in blocks 304, 307, 308, and 309 of 
FIG. 12 to get the quantized residual gain in the linear domain. Next, block 
130 uses the codebook index CI to retrieve the residual quantizer output level 
if a scalar quantizer is used, or the winning residual VQ codevector is a vector 
quantizer is used, then it scales the result by the quantized residual gain. The 
result of such scaling is the signal uq(n) in FIG. 8. 

[0349] The long-term predictor block 140 and the adder 150 together perform 

the long-term synthesis filtering to get the quantized version of the short-term 
prediction residual dq(n) as follows. 

3 

dq{n) - uq{n) + ^b^^,dq{n- pp + 2~i) 

[0350] The short-term predictor block 160 and the adder 170 then perform the 

short-term synthesis filtering to get the decoded output speech signal sq(n) as 

M 

sq{n) = dq{n) + ^ d^sq{n - /) . 
This completes the description of the decoder operations. 
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XI. Hardware and Software Implementations 

[0351] The following description of a general purpose computer system is 

provided for completeness. The present invention can be implemented in 
hardware, or as a combination of software and hardware. Consequently, the 
invention may be implemented in the environment of a computer system or 
otiier processing system. An example of such a computer system 2100 is 
shown in FIG. 21 . In the present invention, all of the signal processing blocks 
of codecs 1050, 2050, and 3000-7000, for example, can execute on one or 
more distinct computer systems 2100, to implement the various methods of the 
present invention. The computer system 2100 includes one or more 
processors, such as processor 2104. Processor 2104 can be a special purpose 
or a general purpose digital signal processor. The processor 2104 is connected 
to a commimication infrastructure 2106 (for example, a bus or network). 
Various software implementations are described in terms of this exemplary 
computer system. After reading this description, it will become apparent to a 
person skilled in the relevant art how to implement the invention using other 
computer systems and/or computer architectures. 

[0352] Computer system 2100 also includes a main memory 2108, preferably 

random access memory (RAM), and may also include a secondary memory 
2110. The secondary memory 2110 may include, for example, a hard disk 
drive 21 12 and/or a removable storage drive 21 14, representing a floppy disk 
drive, a magnetic tape drive, an optical disk drive, etc. The removable storage 
drive 2114 reads from and/or writes to a removable storage unit 21 18 in a well 
known maimer. Removable storage unit 2118, represents a floppy disk, 
magnetic tape, optical disk, etc. which is read by and wTitten to by removable 
storage drive 2114. As will be appreciated, the removable storage unit 2118 
includes a computer usable storage medium having stored therein computer 
software and/or data. 
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[0353] In alternative implementations, secondary memory 2110 may include 

other similar means for allowing computer programs or other instructions to 
be loaded into computer system 2100. Such means may include, for example, 
a removable storage unit 2122 and an interface 2120. Examples of such 
means may include a program cartridge and cartridge interface (such as that 
found in video game devices), a removable memory chip (such as an EPROM, 
or PROM) and associated socket, and other removable storage units 2122 and 
interfaces 2120 which allow software and data to be transferred from the 
removable storage unit 2122 to computer system 2100. 

[0354] Computer system 2100 may also include a communications interface 

2124. Communications interface 2124 allows software and data to be 
transferred between computer system 2100 and external devices. Examples of 
communications interface 2124 may include a modem, a network interface 
(such as an Ethernet card), a communications port, a PCMCIA slot and card, 
etc. Software and data transferred via communications interface 2124 are in 
the form of signals 2128 which may be electronic, electromagnetic, optical or 
other signals capable of being received by communications interface 2124. 
These signals 2128 are provided to communications interface 2124 via a 
communications path 2126. Communications path 2126 carries signals 2128 
and may be implemented using wire or cable, fiber optics, a phone line, a 
cellular phone link, an RF link and other communications channels. 

[0355] In this document, the terms "computer program medium" and 

"computer usable medium" are used to generally refer to media such as 
removable storage drive 2114, a hard disk installed in hard disk drive 2112, 
and signals 2128. These computer program products are means for providing 
software to computer system 2100. 

[0356] Computer programs (also called computer control logic) are stored in 

main memory 2108 and/or secondary memory 2110. Computer programs may 
also be received via communications interface 2124. Such computer 
programs, when executed, enable the computer system 2100 to implement the 
present invention as discussed herein. In particular, the computer programs. 
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when executed, enable the processor 2104 to implement the processes of the 
present invention, such as the methods implemented using the various codec 
structures described above, such as methods 6050, 1350, 1364, 1430, 1450, 
1470, 1520, 1620, 1700, 1800, 1900, and 2000, for example. Accordingly, 
such computer programs represent controllers of the computer system 2100. 
By way of example, in the embodiments of the invention, the processes 
performed by the signal processing blocks of codecs/structures 1050, 2050, 
3000-7000, 1300, 1362, 1400, 1402a, 1404a, and 1404b can be performed by 
computer control logic. Where the invention is implemented using software, 
the software may be stored in a computer program product and loaded into 
computer system 2100 using removable storage drive 2114, hard drive 21 12 or 
communications interface 2124. 
[0357] In another embodiment, features of the invention are implemented 

primarily in hardware using, for example, hardware components such as 
Application Specific Integrated Circuits (ASICs) and gate arrays. 
Implementation of a hardware state machine so as to perform the functions 
described herein will also be apparent to persons skilled in the relevant art(s). 

XIL Conclusion 

[0358] While various embodiments of the present invention have been 

described above, it should be understood that they have been presented by way 
of example, and not limitation. It will be apparent to persons skilled in the 
relevant art that various changes in form and detail can be made therein 
without departing from the spirit and scope of the invention. 

[0359] The present invention has been described above with the aid of 

functional building blocks and method steps illustrating the performance of 
specified fimctions and relationships thereof. The boundaries of these 
ftmctional building blocks and method steps have been arbitrarily defined 
herein for the convenience of the description. Alternate boundaries can be 
defined so long as the specified functions and relationships thereof are 
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appropriately performed. Any such alternate boundaries are thus within the 
scope and spirit of the claimed invention. One skilled in the art will recognize 
that these functional building blocks can be implemented by discrete 
components, application specific integrated circuits, processors executing 
appropriate software and the like or any combination thereof Thus, the 
breadth and scope of the present invention should not be limited by any of the 
above-described exemplary embodiments, but should be defined only in 
accordance with the following claims and their equivalents. 



